Udacity MLND Notebook

[TOC]

Welcome to the Nanodegree

Get started with learning about your Nanodegree. Introduction to Decision Trees, Naive Bayes, Linear and Logistic Regression and Support Vector Machines. You can join the MLND student community by following this link and registering your email - https://mlnd-slack.udacity.com

WELCOME TO THE NANODEGREE

Welcome to MLND

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HG5IYufgDAo.mp4


Program Readiness

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/dc9CmcGTnr0.mp4


What is Machine Learning?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/K45QM8Wi7BU.mp4


Machine Learning vs. Traditional Coding

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_N2iIB_bLXA.mp4


Applications of Machine Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kIM5D_W6Mh8.mp4


Connections to GA Tech

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/DysCmGKRpvs.mp4


Program Outline

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/m0cIDrRWyLw.mp4


What is ML

Introduction to Machine Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bYeteZQrUcE.mp4


Decision Trees

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1RonLycEJ34.mp4


Decision Trees Quiz


QUIZ QUESTION

Between Gender and Age, which one seems more decisive for predicting what app will the users download?

  • Gender
  • Age


Decision Trees Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/h8zH47iFhCo.mp4


Naive Bayes

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jsLkVYXmr3E.mp4


Naive Bayes Quiz


QUIZ QUESTION

If an e-mail contains the word “cheap”, what is the probability of it being spam?

40%

60%
80%


Naive Bayes Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YKN-fjuZ1VU.mp4


Gradient Descent

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/BEC0uH1fuGU.mp4


Linear Regression Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sf51L0RN6zc.mp4

QUIZ QUESTION

What’s the best estimate for the price of a house?

80k

120k

190k
SUBMIT


Linear Regression Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/L5QBqYDNJn0.mp4


Logistic Regression Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wQXKdeVHTmc.mp4

QUIZ QUESTION

Does the student get Accepted?

Yes

No
SUBMIT


Logistic Regression Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/JuAJd9Qvs6U.mp4


Support Vector Machines

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Fwnjx0s_AIw.mp4


Support Vector Machines Quiz


QUIZ QUESTION

Which one is a better line?
The yellow line

The blue line
SUBMIT


Support Vector Machines Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/JrUtTwfnsfM.mp4


Neural Networks

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xFu1_2K2D2U.mp4


Kernel Method

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/x0JqH6-Dhvw.mp4


Kernel Method Quiz


QUIZ QUESTION

Which equation could come to our rescue?

x+y
xy

x^2
SUBMIT


Kernel Method Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/dRFd6HaAXys.mp4


Recap and Challenge

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ecREasTrKu4.mp4


K-means Clustering

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/pv_i08zjpQw.mp4


Hierarchical Clustering

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1PldDT8AwMA.mp4


Practice Project: Detect Spam

Practice Project: Using Naive Bayes to detect spam.

From time to time you will be encouraged to work on practice projects which are aimed at deepening your understanding of the concepts being taught. In this practice project, you will be implementing the Naive Bayes algorithm to detect spam text messages(as taught by Luis earlier in the lesson) from an open source dataset.

Here is the notebook, the solutions are included.


Summary

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hJEuaOUu2yA.mp4


MLND Program Orientation

Before the Program Orientation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/73CdKtS-IwU.mp4


Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/fxNSn63xFvA.mp4


Projects and Progress

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Z9ZLMQWsbsk.mp4


Career Development

Being enrolled in one of Udacity’s Nanodegree programs has many careers-based perks. Our goal is to help you take your learning from this program and apply it in the real world and in your career.

As you venture through the Machine Learning Engineer Nanodegree program, you’ll have the opportunity to

  • Update your resume through a peer-reviewed process using conventions that recruiters expect and get tips on how to best represent yourself to pass the “6 second screen”;
  • Create a cover letter that portrays your soft and hard skills, and most importantly your passion for a particular job that you are interested in applying to;
  • Get your GitHub and LinkedIn profiles reviewed through the lens of a recruiter or hiring manager, focusing on how your profile, projects, code, education and past experiences represent you as a potential candidate;
  • Practice for a technical interview with a professional reviewer on a variety of topics;
  • And more!

You can also find career workshops that Udacity has hosted over the years, where you can gain a plethora of information to prepare you for your ventures into a career. Udacity also provides job placement opportunities with many of our industry partners. To take advantage of this opportunity, fill out the career section of your Udacity professional profile, so we know more about you and your career goals! If all else fails, you can always default to emailing the career team at career-support@udacity.com.


Connecting with Your Community

Your Nanodegree community will play a huge role in supporting you when you get stuck and in helping you deepen your learning. Getting to know your fellow students will also make your experience a lot more fun!

To ask and answer questions, and to contribute to discussions, head to your program forum. You can get there by clicking the Discussion link in the classroom and in the Resources tab in your Udacity Home. You can search to see if someone has already asked a question related to yours, or you can make a new post if no one has. Chances are, someone else is wondering about the same thing you are, so don’t be shy!

In addition, students may connect with one another through Slack, a team-oriented chat program. You can join the MLND Slack student community by following this link and registering your email. There are many content-related channels where you can speak with students about a particular concept, and even discuss your first week in the program using the #first-week-experience channel. In addition, you can talk with MLND graduates and alumni to get a live perspective on the program in the #ask-alumni channel! You can find the student-run community wiki here.


Support from the Udacity Team

The Udacity team is here to help you reach your Nanodegree program goals! You can interact with us in the following ways:

  • Forums: Along with your student community, the Udacity team maintains a strong presence in the forum to help make sure your questions get answered and to connect you with other useful resources.
  • 1-on-1 Appointments: If you get stuck working on a project in the program, our mentors are here to help! You can set up a half-hour appointment with a mentor available for the project at a time you choose to get assistance.
  • Project Reviews: During the project submission process, your submissions will be reviewed by a qualified member of our support team, who will provide comments and helpful feedback on where your submission is strongest, and where your submission needs improvement. The reviews team will support your submissions all the way up to meeting specifications!
  • By email: You can always contact the Machine Learning team with support-related questions using machine-support@udacity.com. Please make sure that you have exhausted all other options before doing so!
    Find out more about the support we offer using the Resources tab in your Udacity Nanodegree Home.

How Does Project Submission Work?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jCJa_VP6qgg.mp4


Integrity and Mindset

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/zCOr3O50gQM.mp4


How Do I Find Time for My Nanodegree?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/d-VfUw7wNEQ.mp4
All calendar applications now let you set up a weekly reminder. I have included a screen capture below of how to set one up in Google Calendar. We recommend coming into the classroom at least twice a week. It is a best practice to set up at least one repeating weekly reminder to continue the Nanodegree program.


Final Tips

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1ZVBvM54hQw.mp4


Wrapping Up the Program Orientation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xujb3Rqxuog.mp4
You now have all the info you need to proceed on your Nanodegree journey!

  • If you have any further questions, perhaps about payment or enrollment, read your Nanodegree Student Handbook for more details.
  • Download Udacity’s mobile app to learn on the go!
  • Remember to put in time consistently, engage with your community, take advantage of the resources available to you, and give us feedback throughout the program.
    We are so glad to have you with us! Return to your Udacity home to keep learning. Good luck!

(Optional) Exploratory Project

Software Requirements

  1. windows + R and
  2. type pip install --user pandas jupyter,
  3. oops, error: Microsoft Visual C++ 9.0 is required. Get it from http://aka.ms/vcpython27
  4. download and install
  5. successfully

Starting the Project

First try

  1. windows + R
  2. typecd <path>my<path>isG:\Udacity\MLND\machine-learning-master\projects\titanic_survival_exploration
  3. type<path> g:
  4. typebash jupyter notebook titanic_survival_exploration.ipynb,show 'bash' 不是内部或外部命令,也不是可运行的程序
  5. Failed

Second try

  1. opengit bash
  2. cd<path>with/``G:/Udacity/MLND/machine-learning-master/projects/titanic_survival_exploration
  3. typebash jupyter notebook titanic_survival_exploration.ipynb
  4. failed

Third try

  1. install Anaconda
  2. windows + R
  3. typecd <path>my<path>isG:\Udacity\MLND\machine-learning-master\projects\titanic_survival_exploration
  4. type<path>``g:
  5. typejupyter notebook titanic_survival_exploration.ipynb
  6. done

Fourth try

  1. opengit bash
  2. cd<path>with/``G:/Udacity/MLND/machine-learning-master/projects/titanic_survival_exploration
  3. typejupyter notebook titanic_survival_exploration.ipynb
  4. done

ipython notebook

Markdown

Question 4(stay tuned):

  1. Pclass == 3

Career: Orientation

Throughout your Nanodegree program, you will see Career Development Lessons and Projects that will help ensure you’re presenting your new skills best during your job search. In this short lesson, meet the Careers team and learn about the career resources available to you as a Nanodegree student.

If you are a Nanodegree Plus student, Career Content and Career Development Projects are required for graduation.

If you are enrolled in a standard Nanodegree program, Career Content and Career Development Projects are optional and do not affect your graduation.

ORIENTATION

Career Services Available to You

Meet the Careers Team

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oR1IxPTTz0U.mp4

Resources


Your Udacity Profile

Connect to Hiring Partners through your Udacity Professional Profile

In addition to the Career Lessons and Projects you’ll find in your Nanodegree program, you have a Udacity Professional Profile linked in the left sidebar.
Your Udacity Professional Profile features important, professional information about yourself. When you make your profile public, it becomes accessible to our Hiring Partners, as well as to recruiters and hiring managers who come to Udacity to hire skilled Nanodegree graduates.

As you complete projects in your Nanodegree program, they will be automatically added to your Udacity Professional Profile to ensure you’re able to show employers the skills you’ve gained through the program. In order to differentiate yourself from other candidates, make sure to go in and customize those project cards. In addition to these projects, be sure to:

  • Keep your profile updated with your basic info and job preferences, such as location
  • Ensure you upload your latest resume
  • Return regularly to your Profile to update your projects and ensure you’re showcasing your best work

If you are looking for a job, make sure to keep your Udacity Professional Profile updated and visible to recruiters!
EDIT YOUR PROFILE NOW !


Model Evaluation and Validation

Apply statistical analysis tools to model observed data, and gauge how well your models perform.

Project: Predicting Boston Housing Prices

For most students, this project takes approximately 8 - 15 hours to complete (about 1 - 3 weeks).

P1 Predicting Boston Housing Prices

STATISTICAL ANALYSIS

Intro: Model Evaluation and Validation

Intro to Model Evaluation and Validation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cseqEWRDs5Q.mp4


Model Evaluation What You’ll Watch

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jYZO17CeZDI.mp4


Model Evaluation What You’ll Learn

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ZLOucNwuqCk.mp4


Course Outline - Fitting It All Together

The ultimate goal of Machine Learning is to have data models that can learn and improve over time. In essence machine learning is making inferences on data from previous examples.

In this first section we review some basic statistics and numerical tools to manipulate & process our data.

Then we will move on to modeling data; reviewing different data types and seeing how they play out in the case of one specific dataset. The section ends by introducing the basic tool of a supervised learning algorithm.

Next, we’ll see how to use our dataset for both training and testing data, and review various tools for how to evaluate how well an algorithm performs.

Finally, we’ll look at the reasons that errors arise, and the relationship between adding more data and adding more complexity in getting good predictions. The last section ends by introducing cross validation, a powerful meta-tool for helping us use our tools correctly.


Model Evaluation What You’ll Do

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kJCAuHjWiOA.mp4


Prerequisites

Statistics Review & Supporting Libraries

In this section we will go over some prerequisites for this course, review basic statistics concepts and problem sets, and finally teach you how to use some useful data analysis Python libraries to explore real-life datasets using the concepts you reviewed earlier.


Prerequisites

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0ANDJ8i_deE.mp4
Here are shortcuts to the prerequisite statistics courses:

Udacity’s descriptive stats course [link]

Udacity’s inferential stats course [link]

You will also need to have just a little bit of git experience — enough to check out our code repository. If you’ve ever used git before, you should be fine. If this is truly your first time with git, once you get to the first mini-project, you may want to quickly look at the first lesson of Udacity’s git course.

  1. mode mean
  2. variance``standard deviation
  3. Bessel's Correction use n-1 instead of n
  4. sample SD

Measures of Central Tendency

Introduction: Topics Covered

Measures of Central Tendency

In this lesson, we will cover the following topics:

  • Mean
  • Median
  • Mode
  • This lesson is meant to be a refresher for those who have no statistics background and therefore if you are familiar with these concepts you may skip this lesson.

Which Major?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mIzPoh_kqw4.mp4

Quiz: Which Major?
Enter your answers as a number with no commas or symbols ($). Enter the number in thousands (5 digits)


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/t0yIetl9ZxI.mp4


One Number to Describe Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6QfvhQ0En0E.mp4


Which Number to Choose?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7E7Czixpviw.mp4


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TSw2AAaKxBA.mp4


Mode of Dataset

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/80BAbiEWsaY.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HvOjcTFlVTI.mp4


Mode of Distribution

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/47JDwoDUxP8.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/s9dHF4MMGx0.mp4


Mode - Negatively Skewed Distribution

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xBnhUJENAtk.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oNRtSJvtkJc.mp4


Mode - Uniform Distribution

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TE2BZql64XY.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mFm0FfWHlXw.mp4


More than One Mode?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1GuHNqJNY2M.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/9bsmO7cKzPk.mp4


Mode of Categorical Data

Quiz
The data doesn’t need to be numeric to find a mode: we can also compute the mode for categorical data as well! On the next slide, you’ll be asked to find the mode of a categorical data set: the preferred M&M flavor of 8,000 Udacity students.
START QUIZ


Answer
Remember, the mode occurs on the X-axis, so you are looking for whatever value has the highest frequency.

The numbers 7,000 and 1,000 are the actual frequencies. The mode, itself, is “Plain.”


More o’ Mode!

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wBduq7St2Ak.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/74mudn321tA.mp4


Find the Mean

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/S4KbzIyEwV8.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/El5cY2jlzuM.mp4


Procedure for Finding Mean

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nRurXCTYxG4.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lK2lDLdE6iA.mp4


Iterative Procedure

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/r2s9INGd-Ls.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/NcWS_BvM3IU.mp4


Helpful Symbols

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/w4_8YCp-9fI.mp4


Properties of the Mean

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/uSXJtEpwEVM.mp4

Quiz: Properties Of The Mean
Please note: The last option “The mean will change if we add an extreme value to the dataset.” is not necessarily a property of the mean, more a behavioral tendency. But for the purposes of this quiz, you can mark it as a property



https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AqlvTMZg6HY.mp4


Mean with Outlier


What Can You Expect?


UNC


Requirement for Median


Find the Median


Median with Outlier


Find Median with Outlier


Measures of Center


Order Measures of Center 1


Order Measures of Center 2


Use Measures of Center to Compare

Link to poll: How many Facebook friends do you have?
Link to poll results

Mashable: Who is an Average Facebook User?
Zeebly Social Me (optional but fun use of statistics)


Udacians’ Facebook Friends - Mean

Link to Udacians’ Facebook Friends
Copy and paste the data into your own spreadsheet to perform the calculations. From Google Drive (at the top of the page once you’re signed in to your Google account), click the button on the left that says “CREATE” and click “Spreadsheet.” Please round your answer to two decimal places.


Udacians’ Facebook Friends - Median

Link to Udacians’ Facebook Friends

Copy and paste the data into your own spreadsheet to perform the calculations. From Google Drive (at the top of the page once you’re signed in to your Google account), click the button on the left that says “CREATE” and click “Spreadsheet.”


Formula for Location of Median


Wrap Up - Measures of Center

Quiz: Wrap Up - Measures Of Center
Here is a short doc outlining Mean, Median, and Mode. http://tinyurl.com/measureOfCenter


Good Job!


Variability of Data

Introduction: Topics Covered

In this lesson, we will cover the following topics:

  • Inter Quartile Range
  • Outliers
  • Standard Deviation
  • Bessel’s Correction

This lesson is meant to be a refresher for those who have no statistics background and therefore if you are familiar with these concepts you may skip this lesson.


Social Networkers’ Salaries


Should You Get an Account?


What’s the Difference?


Quantify Spread


Does Range Change?


Mark Z the Outlier


Chop Off the Tails


Where Is Q1?


Q3 - Q1


IQR


What Is an Outlier?


Define Outlier


Match Boxplots


Mean Within IQR?


Problem with IQR


Measure Variability


Calculate Mean


Deviation from Mean


Average Deviation


Equation for Average Deviation


Be Happy and Get Rid of Negatives


Absolute Deviations


Average Absolute Deviation


Formula for Avg. Abs. Dev.


Squared Deviations


Sum of Squares


Average Squared Deviation


####

####

####

####

####

####

####

####

Numpy&Pandas Tutorials

Numpy and Pandas Tutorials

Now that you reviewed some basic statistics, lets go over some Python libraries that allow you to explore data and process large datasets.

Specifically we will go over numpy which will allow us to process large amount of numerical data and panda series and dataframes which allow us to store large datasets and extract information from them.

Numpy Library Documentation: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

Pandas Library Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/

We highly recommend going through this resource by Justin Johnson if you have not worked with Numpy before.

Another great resource is the SciPy-lectures series on this topic.


Numpy

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/l_Tzjxfa_5g.mp4


Numpy Playground

Here’s the code

import numpy as np

'''
The following code is to help you play with Numpy, which is a library 
that provides functions that are especially useful when you have to
work with large arrays and matrices of numeric data, like doing 
matrix matrix multiplications. Also, Numpy is battle tested and 
optimized so that it runs fast, much faster than if you were working
with Python lists directly.
'''

'''
The array object class is the foundation of Numpy, and Numpy arrays are like
lists in Python, except that every thing inside an array must be of the
same type, like int or float.
'''
# Change False to True to see Numpy arrays in action
if False:
    array = np.array([1, 4, 5, 8], float)
    print array
    print ""
    array = np.array([[1, 2, 3], [4, 5, 6]], float)  # a 2D array/Matrix
    print array

'''
You can index, slice, and manipulate a Numpy array much like you would with a
a Python list.
'''
# Change False to True to see array indexing and slicing in action
if False:
    array = np.array([1, 4, 5, 8], float)
    print array
    print ""
    print array[1]
    print ""
    print array[:2]
    print ""
    array[1] = 5.0
    print array[1]

# Change False to True to see Matrix indexing and slicing in action
if False:
    two_D_array = np.array([[1, 2, 3], [4, 5, 6]], float)
    print two_D_array
    print ""
    print two_D_array[1][1]
    print ""
    print two_D_array[1, :]
    print ""
    print two_D_array[:, 2]

'''
Here are some arithmetic operations that you can do with Numpy arrays
'''
# Change False to True to see Array arithmetics in action
if False:
    array_1 = np.array([1, 2, 3], float)
    array_2 = np.array([5, 2, 6], float)
    print array_1 + array_2
    print ""
    print array_1 - array_2
    print ""
    print array_1 * array_2

# Change False to True to see Matrix arithmetics in action
if False:
    array_1 = np.array([[1, 2], [3, 4]], float)
    array_2 = np.array([[5, 6], [7, 8]], float)
    print array_1 + array_2
    print ""
    print array_1 - array_2
    print ""
    print array_1 * array_2

'''
In addition to the standard arthimetic operations, Numpy also has a range of
other mathematical operations that you can apply to Numpy arrays, such as
mean and dot product.
Both of these functions will be useful in later programming quizzes.
'''
if True:
    array_1 = np.array([1, 2, 3], float)
    array_2 = np.array([[6], [7], [8]], float)
    print np.mean(array_1)
    print np.mean(array_2)
    print ""
    print np.dot(array_1, array_2)

Pandas

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8ay7tX26YxE.mp4


Pandas Playground – Series

Here’s the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import pandas as pd
'''
The following code is to help you play with the concept of Series in Pandas.
You can think of Series as an one-dimensional object that is similar to
an array, list, or column in a database. By default, it will assign an
index label to each item in the Series ranging from 0 to N, where N is
the number of items in the Series minus one.
Please feel free to play around with the concept of Series and see what it does
*This playground is inspired by Greg Reda's post on Intro to Pandas Data Structures:
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
'''
# Change False to True to create a Series object
if False:
series = pd.Series(['Dave', 'Cheng-Han', 'Udacity', 42, -1789710578])
print series
'''
You can also manually assign indices to the items in the Series when
creating the series
'''
# Change False to True to see custom index in action
if False:
series = pd.Series(['Dave', 'Cheng-Han', 359, 9001],
index=['Instructor', 'Curriculum Manager',
'Course Number', 'Power Level'])
print series
'''
You can use index to select specific items from the Series
'''
# Change False to True to see Series indexing in action
if False:
series = pd.Series(['Dave', 'Cheng-Han', 359, 9001],
index=['Instructor', 'Curriculum Manager',
'Course Number', 'Power Level'])
print series['Instructor']
print ""
print series[['Instructor', 'Curriculum Manager', 'Course Number']]
'''
You can also use boolean operators to select specific items from the Series
'''
# Change False to True to see boolean indexing in action
if True:
cuteness = pd.Series([1, 2, 3, 4, 5], index=['Cockroach', 'Fish', 'Mini Pig',
'Puppy', 'Kitten'])
print cuteness > 3
print ""
print cuteness[cuteness > 3]


Pandas Playground – Dataframe

Here’s the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import numpy as np
import pandas as pd
'''
The following code is to help you play with the concept of Dataframe in Pandas.
You can think of a Dataframe as something with rows and columns. It is
similar to a spreadsheet, a database table, or R's data.frame object.
*This playground is inspired by Greg Reda's post on Intro to Pandas Data Structures:
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
'''
'''
To create a dataframe, you can pass a dictionary of lists to the Dataframe
constructor:
1) The key of the dictionary will be the column name
2) The associating list will be the values within that column.
'''
# Change False to True to see Dataframes in action
if False:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
'Lions', 'Lions'],
'wins': [11, 8, 10, 15, 11, 6, 10, 4],
'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football
'''
Pandas also has various functions that will help you understand some basic
information about your data frame. Some of these functions are:
1) dtypes: to get the datatype for each column
2) describe: useful for seeing basic statistics of the dataframe's numerical
columns
3) head: displays the first five rows of the dataset
4) tail: displays the last five rows of the dataset
'''
# Change False to True to see these functions in action
if True:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
'Lions', 'Lions'],
'wins': [11, 8, 10, 15, 11, 6, 10, 4],
'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football.dtypes
print ""
print football.describe()
print ""
print football.head()
print ""
print football.tail()


Create a DataFrame

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hMxOuWJaVDA.mp4

Here is a link to the pandas documentation.

Here’s also an excellent series of tutorials as IPython notebooks

(Thank you to Dominique Luna for sharing!)

Also note: you do not need to use pandas.Series, you can pass in python lists as the values in this case:

olympic_medal_counts_df = DataFrame(
    {'country_name': countries,
     'gold': gold,
     'silver': silver,
     'bronze': bronze}) 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from pandas import DataFrame, Series
#################
# Syntax Reminder:
#
# The following code would create a two-column pandas DataFrame
# named df with columns labeled 'name' and 'age':
#
# people = ['Sarah', 'Mike', 'Chrisna']
# ages = [28, 32, 25]
# df = DataFrame({'name' : Series(people),
# 'age' : Series(ages)})
def create_dataframe():
'''
Create a pandas dataframe called 'olympic_medal_counts_df' containing
the data from the table of 2014 Sochi winter olympics medal counts.
The columns for this dataframe should be called
'country_name', 'gold', 'silver', and 'bronze'.
There is no need to specify row indexes for this dataframe
(in this case, the rows will automatically be assigned numbered indexes).
You do not need to call the function in your code when running it in the
browser - the grader will do that automatically when you submit or test it.
'''
countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
'Netherlands', 'Germany', 'Switzerland', 'Belarus',
'Austria', 'France', 'Poland', 'China', 'Korea',
'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']
gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]
# your code here
olympic_medal_counts_df = DataFrame(
{'country_name': countries,
'gold': gold,
'silver': silver,
'bronze': bronze})
return olympic_medal_counts_df

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-6Zw3Y4iXRY.mp4


Dataframe Columns

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/G4gFryXnrr8.mp4


Pandas Playground - Indexing Dataframes

Here’s the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import pandas as pd
'''
You can think of a DataFrame as a group of Series that share an index.
This makes it easy to select specific columns that you want from the
DataFrame.
Also a couple pointers:
1) Selecting a single column from the DataFrame will return a Series
2) Selecting multiple columns from the DataFrame will return a DataFrame
*This playground is inspired by Greg Reda's post on Intro to Pandas Data Structures:
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
'''
# Change False to True to see Series indexing in action
if False:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
'Lions', 'Lions'],
'wins': [11, 8, 10, 15, 11, 6, 10, 4],
'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football['year']
print ''
print football.year # shorthand for football['year']
print ''
print football[['year', 'wins', 'losses']]
'''
Row selection can be done through multiple ways.
Some of the basic and common methods are:
1) Slicing
2) An individual index (through the functions iloc or loc)
3) Boolean indexing
You can also combine multiple selection requirements through boolean
operators like & (and) or | (or)
'''
# Change False to True to see boolean indexing in action
if True:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
'Lions', 'Lions'],
'wins': [11, 8, 10, 15, 11, 6, 10, 4],
'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football.iloc[[0]]
print ""
print football.loc[[0]]
print ""
print football[3:5]
print ""
print football[football.wins > 10]
print ""
print football[(football.wins > 10) & (football.team == "Packers")]


Pandas Vectorized Methods

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hvSEZxcH9PM.mp4
As a refresher on lambda, lambda functions are small inline functions that are defined on-the-fly in Python. lambda x: x>= 1 will take an input x and return x>=1, or a boolean that equals True or False.

In this example, map() and applymap() create a new Series or DataFrame by applying the lambda function to each element. Note that map() can only be used on a Series to return a new Series and applymap() can only be used on a DataFrame to return a new DataFrame.

For further reference, please refer to the official documentation on lambda:

Lambda Function


Average Bronze Medals

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AjniaYFfCeg.mp4
You might find using “boolean indexing“ helpful for this problem.

Here is a link to the pandas documentation.

Here’s also an excellent series of tutorials as IPython notebooks

(Thank you to Dominique Luna for sharing!)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
from pandas import DataFrame, Series
import numpy
def avg_medal_count():
'''
Compute the average number of bronze medals earned by countries who
earned at least one gold medal.
Save this to a variable named avg_bronze_at_least_one_gold. You do not
need to call the function in your code when running it in the browser -
the grader will do that automatically when you submit or test it.
HINT-1:
You can retrieve all of the values of a Pandas column from a
data frame, "df", as follows:
df['column_name']
HINT-2:
The numpy.mean function can accept as an argument a single
Pandas column.
For example, numpy.mean(df["col_name"]) would return the
mean of the values located in "col_name" of a dataframe df.
'''
countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
'Netherlands', 'Germany', 'Switzerland', 'Belarus',
'Austria', 'France', 'Poland', 'China', 'Korea',
'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']
gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]
olympic_medal_counts = {'country_name':Series(countries),
'gold': Series(gold),
'silver': Series(silver),
'bronze': Series(bronze)}
df = DataFrame(olympic_medal_counts)
# YOUR CODE HERE
#print df[df.gold>=1]
#print df[df.gold>=1]['bronze']
#print (df[df.gold>=1]['bronze']).apply(numpy.mean)
avg_bronze_at_least_one_gold = numpy.mean(df[df.gold>=1]['bronze'])
return avg_bronze_at_least_one_gold

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kibtHtPgWvs.mp4


Average Gold, Silver, and Bronze Medals

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/19b96U6dLtY.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import numpy
from pandas import DataFrame, Series
def avg_medal_count():
'''
Using the dataframe's apply method, create a new Series called
avg_medal_count that indicates the average number of gold, silver,
and bronze medals earned amongst countries who earned at
least one medal of any kind at the 2014 Sochi olympics. Note that
the countries list already only includes countries that have earned
at least one medal. No additional filtering is necessary.
You do not need to call the function in your code when running it in the
browser - the grader will do that automatically when you submit or test it.
'''
countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
'Netherlands', 'Germany', 'Switzerland', 'Belarus',
'Austria', 'France', 'Poland', 'China', 'Korea',
'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']
gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]
olympic_medal_counts = {'country_name':countries,
'gold': Series(gold),
'silver': Series(silver),
'bronze': Series(bronze)}
df = DataFrame(olympic_medal_counts)
# YOUR CODE HERE
#print df[(df.gold>=1)|(df.silver>=1)|(df.bronze>=1)]
#print df[(df.gold>=1)|(df.silver>=1)|(df.bronze>=1)]['bronze','silver','gold']
print df['bronze','silver','gold'].apply(numpy.mean)
#return 0
#return avg_medal_count

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HbCmZuVp548.mp4


Matrix Multiplication and Numpy Dot

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/yAgqlTfWc9E.mp4

The second line in the green vector on the top line starting at 1:15 should read: “1 x 4 + 2 x 5”.

This vector should also be a row vector (1 x 3 matrix) instead of a column vector (3 x 1 matrix).

You can read more about numpy.dot or matrix multiplication with numpy below:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html


Olympics Medal Points

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/uKvbguVQYh4.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import numpy
from pandas import DataFrame, Series
def numpy_dot():
'''
Imagine a point system in which each country is awarded 4 points for each
gold medal, 2 points for each silver medal, and one point for each
bronze medal.
Using the numpy.dot function, create a new dataframe called
'olympic_points_df' that includes:
a) a column called 'country_name' with the country name
b) a column called 'points' with the total number of points the country
earned at the Sochi olympics.
You do not need to call the function in your code when running it in the
browser - the grader will do that automatically when you submit or test it.
'''
countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
'Netherlands', 'Germany', 'Switzerland', 'Belarus',
'Austria', 'France', 'Poland', 'China', 'Korea',
'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']
gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]
# YOUR CODE HERE
olympic_medal_counts = {'country_name':countries,
'gold': Series(gold),
'silver': Series(silver),
'bronze': Series(bronze)}
df = DataFrame(olympic_medal_counts)
df['points'] = df[['gold','silver','bronze']].dot([4, 2, 1])
olympic_points_df = df[['country_name','points']]
return olympic_points_df

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/EQMUh4Id4Po.mp4


  1. numpy which will allow us to process large amount of numerical data.
  2. pandas which allow us to store large datasets and extract information from them.
  3. Numpy Playground
  4. Pandas Playground – Series
  5. Pandas Playground – Dataframe
  6. Pandas Playground - Indexing Dataframes
  7. map()andapplymap()create a new Series or DataFrame by applying the lambda function to each element. Note thatmap()can only be used on a Series to return a new Series andapplymap()can only be used on a DataFrame to return a new DataFrame.
  8. Here is a link to the pandas documentation.
  9. Here’s also an excellent series of tutorials as IPython notebooks
  10. intro-to-pandas-data-structures

DATA MODELING

scikit-learn Tutorial

  • windows + r
    1. typepip show scikit-learn --version,I have installed scikit-learn
    2. typepip install --upgrade 'scikit-learn>=0.17,<0.18'
      It shows系统找不到指定的文件。
    3. typeconda -c install scikit-learn=0.17
      It showsconda-script.py: error: unrecognized arguments: -c
  • opengit bash
    1. typepip install --upgrade 'scikit-learn>=0.17,<0.18',success
    2. typeconda -c install scikit-learn=0.17failedconda-script.py: error: unrecognized arguments: -c

Introduction to scikit-learn

Scikit-Learn

Scikit-learn is an open source Machine Learning library that is built on NumPy, SciPy and matplotlib. It uses a Python interface and supports various regression, classification and clustering algorithms. You will be using this library throughout this program to implement in your projects.

Example using Scikit-learn

To show you how we can leverage sklearn(short for scikit-learn), here is an example of a simple Linear Regression Classifier being used to make predictions on the Boston housing prices dataset which comes as a preloaded dataset with sklearn. We will not dive deep into the dataset per se, nor will we split our dataset into training and testing splits just yet(you will learn about the importance of this in the next lesson), the goal of this node is to give you a high level view of how with just a few lines of code, you can make predictions on a dataset using the sklearn tool.

This data sets consists of 506 samples with a dimensionality of 13. We will run a Linear Regression classifier on the feature set to make predictions on the prices.

We start by getting the necessary imports.

from sklearn import datasets # sklearn comes with a variety of preloaded datasets 
from sklearn import metrics # calculate how well our model is doing
from sklearn.linear_model import LinearRegression

There are several ways in which we can load datasets in sklearn. For now, we will start the most basic way using a dataset which is pre loaded.

# Load the dataset
housing_data = datasets.load_boston()

We now define the model we want to use and herein lies one of the main advantages of using this library.

linear_regression_model = LinearRegression()

Next, we can fit our Linear Regression model on our feature set to make predictions for our labels(the price of the houses). Here, housing_data.data is our feature set and housing_data.target are the labels we are trying to predict.

linear_regression_model.fit(housing_data.data, housing_data.target)
Once our model is fit, we make predictions as follows:

predictions = linear_regression_model.predict(housing_data.data)

Lastly, we want to check how our model does by comparing our predictions with the actual label values. Since this is a regression problem, we will use the r2 score metric. You will learn about the various classification and regression metrics in future lessons.

score = metrics.r2_score(housing_data.target, predictions)

And there we have it. We have trained a regression model on a dataset and calculated how well our model does all with just a few lines of code and with all the math abstracted from us. In the next nodes, we will walk you through installing sklearn on your system, and you will work with Katie on a sample problem.


scikit-learn Installation

scikit-learn Installation

First, check that you have a working python installation. Udacity uses python 2.7 for our code templates and in-browser exercises.

We recommend using pip to install packages. First get and install pip from here. If you are using Anaconda, you can also use the conda command to install packages.

  • To install scikit-learn via pip or anaconda:
    • open your terminal (terminal on a mac or cmd on a PC)
    • install sklearn with the command: pip install scikit-learn or conda install scikit-learn
  • If you do not use pip or conda, further installation instructions can be found here.
Important note about scikit-learn versioning

scikit-learn has recently come out with a stable release of its library with version v0.18. With this version comes a few changes to some of the functions we will talk about extensively in this course, such as train_test_split, gridSearchCV, ShuffleSplit, and learning_curves. The documentation available on scikit-learn’s website will reference v0.18, however Katie, Udacity’s quizzes, and our projects, are still written in v0.17. Please make sure that when using the documentation and scikit-learn, you reference version v0.17 and not version v0.18. In the near future, we will be updating our content to match the most current version.

Please see this forum post that provides more detail on this topic. If you have any additional questions or concerns, feel free to discuss them in the forums or email machine-support@udacity.com.

If you’ve accidentally installed version v0.18 through pip, not to worry! Use the command below to downgrade your scikit-learn version to v0.17:

pip install --upgrade 'scikit-learn>=0.17,<0.18'

If you are using the Anaconda distribution of Python and have scikit-learn installed as version v0.18, you can also use the command below to downgrade your scikit-learn version to v0.17:

conda -c install scikit-learn=0.17

scikit-learn Code

In this next section Katie will walk through using the scikit-learn (or sklearn) documentation with a Gaussian Naive Bayes model. For this exercise it is not important to know all of the details of Naive Bayes or the code Katie is demonstrating. Focus on taking in the basic layout of sklearn, which we can then use to evaluate and validate any data model.

We will cover Naive Bayes along with other useful supervised models in much more detail in the upcoming Supervised Machine Learning course and use what we learn from this course to evaluate each model’s strengths and weaknesses.

If you want a sneak peak into Naive Bayes, you can check out the documentation here.


Getting Started With sklearn

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/olGPVtH7KGU.mp4


Gaussian NB Example

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wpnDwiqTCJA.mp4


GaussianNB Deployment on Terrain Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/VBs6D4ggnYY.mp4

Quiz: GaussianNB Deployment On Terrain Data
To find the ClassifyNB.py script that you need to update for the quiz, you can click on the dropdown in the classroom code editor to get a list of files that will be used.

In the quiz that follows, the line that reads
pred = clf.predict(features_test)
is not necessary for drawing the decision boundary, at least as we’ve written the code.

However, the whole point of making a classifier is that you can make predictions with it, so be sure to keep it in mind since you’ll be using it in the quiz after this one.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/python
""" Complete the code in ClassifyNB.py with the sklearn
Naive Bayes classifier to classify the terrain data.
The objective of this exercise is to recreate the decision
boundary found in the lesson video, and make a plot that
visually shows the decision boundary """
from prep_terrain_data import makeTerrainData
from class_vis import prettyPicture, output_image
from ClassifyNB import classify
import numpy as np
import pylab as pl
features_train, labels_train, features_test, labels_test = makeTerrainData()
### the training data (features_train, labels_train) have both "fast" and "slow" points mixed
### in together--separate them so we can give them different colors in the scatterplot,
### and visually identify them
grade_fast = [features_train[ii][0] for ii in range(0, len(features_train)) if labels_train[ii]==0]
bumpy_fast = [features_train[ii][1] for ii in range(0, len(features_train)) if labels_train[ii]==0]
grade_slow = [features_train[ii][0] for ii in range(0, len(features_train)) if labels_train[ii]==1]
bumpy_slow = [features_train[ii][1] for ii in range(0, len(features_train)) if labels_train[ii]==1]
# You will need to complete this function imported from the ClassifyNB script.
# Be sure to change to that code tab to complete this quiz.
clf = classify(features_train, labels_train)
### draw the decision boundary with the text points overlaid
prettyPicture(clf, features_test, labels_test)
output_image("test.png", "png", open("test.png", "rb").read())

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TcSnd3_hAy8.mp4


Nature of Data

Data Types 1 - Numeric Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xmuWRRTPS4k.mp4


Data Types 2 - Categorical Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/uc_FRItxRMs.mp4


Data Types 3 - Time Series Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6OmiG5zzZoA.mp4


Treatment of categorical data

Many algorithms assume that input data is numerical. For categorical data, this often means converting categorical data into numerical data that represents the same patterns.

One standard way of doing this is with one-hot encoding. There are built-in methods for this in scikit-learn.

Essentially, a categorical feature with 3 possible values is converted into three binary features corresponding to the values. The new feature corresponding to the category a datum belongs to has value 1, while the other new features have value 0.

For example, in a dataset on baseball players, one feature might be “Handedness” which can take values “left” or “right”. Then the data:

  • Joe
    • Handedness: right
  • Jim

    • Handedness: left
      Would become:
  • Joe

    • Handedness/right: 1
    • Handedness/left: 0
  • Jim
    • Handedness/right: 0
    • Handedness/left: 1
      For ordinal data, it often makes sense to simply assign the values to integers. So the following data:

Joe
Skill: low
Jim
Skill: medium
Jane
Skill: high
Would become:

Joe
Skill: 0
Jim
Skill: 1
Jane
Skill: 2
These approaches are not all that is possible. However in general these simple approaches suffice, and if there is a reason to use another encoding that will be subject to the nature of the data.


Encoding using sklearn

Encoding in sklearn is done using the preprocessing module which comes with a variety of options of manipulating data before going into the analysis of data. We will focus on two forms of encoding for now, the LabelEncoder and the OneHotEncoder.

Label Encoder

First, we have to import the preprocessing library.

from sklearn import preprocessing

Let’s create a dummy dataframe named data with a column whose values we want to transform from categories to integers.

# creating sample data
sample_data = {'name': ['Ray', 'Adam', 'Jason', 'Varun', 'Xiao'],
'health':['fit', 'slim', 'obese', 'fit', 'slim']}
# storing sample data in the form of a dataframe
data = pandas.DataFrame(sample_data, columns = ['name', 'health'])

We have 3 different labels that we are looking to categorize: slim, fit, obese. To do this, we will call LabelEncoder() and fit it to the column we are looking to categorize.

label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(data['health'])

Once you have fit the label encoder to the column you want to encode, you can then transform that column to integer data based on the categories found in that column. That can be done as follows:

label_encoder.transform(data['health'])

This will give you the output:

array([0, 2, 1, 0, 2])

You can combine the fit and transform statements above by using label_encoder.fit_transform(data['health']).

The string categorical health data has been mapped as follows:

fit : 0
obese: 1
slim: 2

One thing to keep in mind when encoding data is the fact that you do not want to skew your analysis because of the numbers that are assigned to your categories. For example, in the above example, slim is assigned a value 2 and obese a value 1. This is not to say that the intention here is to have slim be a value that is empirically twice is likely to affect your analysis as compared to obese. In such situations it is better to one-hot encode your data as all categories are assigned a 0 or a 1 value thereby removing any unwanted biases that may creep in if you simply label encode your data.

One-hot Encoder

If we were to apply the one-hot transformation to the same example we had above, we’d do it in Pandas using get_dummies as follows:

pandas.get_dummies(data['health'])

We could do this in sklearn on the label encoded data using OneHotEncoder as follows:

ohe = preprocessing.OneHotEncoder() # creating OneHotEncoder object
label_encoded_data = label_encoder.fit_transform(data['health'])
ohe.fit_transform(label_encoded_data.reshape(-1,1))

One-Hot Encoding

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# In this exercise we'll load the titanic data (from Project 0)
# And then perform one-hot encoding on the feature names
import numpy as np
import pandas as pd
# Load the dataset
X = pd.read_csv('titanic_data.csv')
# Limit to categorical data
X = X.select_dtypes(include=[object])
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# TODO: Create a LabelEncoder object, which will turn all labels present in
# in each feature to numbers. For example, the labels ['cat', 'dog', 'fish']
# might be transformed into [0, 1, 2]
le = LabelEncoder()
# TODO: For each feature in X, apply the LabelEncoder's fit_transform
# function, which will first learn the labels for the feature (fit)
# and then change the labels to numbers (transform).
for feature in X:
le.fit(X[feature])
X[feature] = le.transform(X[feature])
# TODO: Create a OneHotEncoder object, which will create a feature for each
# label present in the data. For example, for a feature 'animal' that had
# the labels ['cat','dog','fish'], the new features (instead of 'animal')
# could be ['animal_cat', 'animal_dog', 'animal_fish']
ohe = OneHotEncoder()
# TODO: Apply the OneHotEncoder's fit_transform function to all of X, which will
# first learn of all the (now numerical) labels in the data (fit), and then
# change the data to one-hot encoded entries (transform).
ohe.fit(X)
onehotlabels = ohe.transform(X).toarray()

Quiz: One-Hot Encoding
Having trouble? Here is a useful forum discussion about this quiz.
Here are some other links you may find helpful - LabelEncoder, OneHotEncoder


Time series data leakage

When dealing with time-series data, it can be tempting to simply disregard the timing structure and simply treat it as the appropriate form of categorical or numerical data.

One important concern, however, is that if you are building a predictive project looking at forecasting future data points. In this case, it is important NOT to use the future as a source of information! Since “hindsight is 20/20” and retrodictions are much easier than predictions, in predictive tasks it’s generally a good idea to use a training set made up of data from before a certain point, a validation set of data from some dates beyond that, and testing data leading up to the present. This way your algorithm won’t overfit by learning future trends.


A Hands-on Example

In the next section we’ll explore the famous Enron Email Dataset which was the focus of much of the Introduction to Machine Learning course.

While this specific dataset will play a less central role in this Nanodegree program, we will return to it a few times as an example to get practice with various techniques as they are introduced.

You can download our copy of the dataset here, along with the starting code for a variety of mini-projects. None of these mini-projects are required for completing the Nanodegree program, but they are great practice!


####


Datasets and Questions

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TdopVWltgqM.mp4


What Is A POI

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wDQhif-MWuY.mp4


EVALUATION AND VALIDATION

Training & Testing

Benefits of Testing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7LGaeYfvRug.mp4


Features and Labels

As you continue your journey into designing ML algorithms using sklearn, you will come across two new terms namely, Features and Labels.

Features are individual measurable properties that you will be using to make predictions about your labels.

To understand this better, let’s use an example. Let’s say you are trying to design a model that will be able to predict whether you will like a particular kind of cuisine or not. For this case, the label is a Yes for when the model thinks you will like said cuisine and No for when it thinks otherwise. The features here could be things like Sweetness, Spicyness, Bitterness, Tangyness and the like. One thing to note here is that when using our features we have to make sure that they are represented in a way that doesn’t skew one feature over another, in other words it’s usually a good idea to normalize or standardize your features; you will learn about these concepts in future lessons.

For now, as long as you understand the premise of what features and labels are and how they are used, you can proceed to the next node where Sebastian will explain this concept using a visual example.


Features and Labels Musical Example

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rnv0-lG9yKU.mp4


Evaluation Metrics

  1. Classification metrics
  2. Regression metrics

Welcome to Evaluation Metrics Lesson

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IHuFWRM9f9Q.mp4


Overview of Lesson

n this lesson we’ll look at a small selection of common performance metrics and evaluate some algorithms with them on the Titanic dataset you used earlier.

There are a few important things to keep in mind here:

  • There is a big difference in performance based on whether a train/test split is used.
  • In general, performance on all metrics is correlated. But some algorithms may end up doing better or worse in different situations.
  • The practical coding of any metric looks almost exactly the same! The difficulty comes in how to make the choice, not in how to implement it.
The topics covered in this lesson are:
  • Accuracy
  • Precision
  • Recall
  • Confusion Matrix
  • F1 score
  • Mean Absolute Error
  • Mean Squared Error

If you are familiar with these concepts you can skip ahead, but we do recommend completing this lesson as a refresher nonetheless.


MANAGING ERROR AND COMPLEXITY

Causes of Error

  • matplotlib.pyplot.plot

    1
    2
    plt.plot(training_sizes,training_scores,'go',label='training_scores')
    plt.plot(training_sizes,testing_scores,'rs',label='testing_scores')
  • Bias-Variance Tradeoff

Causes of Error

Now that we have covered some basic metrics for measuring model performance, let us turn our attention to reasons why models exhibit errors in the first place.

In model prediction there are two main sources of errors that a model can suffer from.

Bias

Bias due to a model being unable to represent the complexity of the underlying data. A high Bias model is said to underfit the data.

Variance

Variance due to a model being overly sensitive to the limited data it has been trained on. A high Variance model is said to overfit the data.

In the coming videos, we will go over each in detail.


Error due to Bias

Error due to Bias - Accuracy and Underfitting

Bias occurs when a model has enough data but is not complex enough to capture the underlying relationships. As a result, the model consistently and systematically misrepresents the data, leading to low accuracy in prediction. This is known as underfitting. Simply put, bias occurs when we have an inadequate model.

Example 1

An example might be when we have objects that are classified by color and shape, for example easter eggs, but our model can only partition and classify objects by color. It would therefore consistently mislabel future objects–for example labeling rainbows as easter eggs because they are colorful.

Example 2

Another example would be continuous data that is polynomial in nature, with a model that can only represent linear relationships. In this case it does not matter how much data we feed the model because it cannot represent the underlying relationship. To overcome error from bias, we need a more complex model.


Error due to Variance

Error due to Variance - Precision and Overfitting

When training a model, we typically use a limited number of samples from a larger population. If we repeatedly train a model with randomly selected subsets of data, we would expect its predictons to be different based on the specific examples given to it. Here variance is a measure of how much the predictions vary for any given test sample.

Some variance is normal, but too much variance indicates that the model is unable to generalize its predictions to the larger population. High sensitivity to the training set is also known as overfitting, and generally occurs when either the model is too complex or when we do not have enough data to support it.

We can typically reduce the variability of a model’s predictions and increase precision by training on more data. If more data is unavailable, we can also control variance by limiting our model’s complexity.


Learning Curve

Learning Curve

Now that you have understood the Bias and Variance concepts let us learn about ways we can identify when our model performs well. The Learning Curve functionality from sklearn can help us in this respect. It allows us to study the behavior of our model with respect to the number of data points being considered to understand if our model is performing well or not.

To start with , we have to import the module:

from sklearn.learning_curve import learning_curve # sklearn 0.17
from sklearn.model_selection import learning_curve # sklearn 0.18

From the documentation, a reasonable implementation of the function would be as follows:

learning_curve(
       estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)

Here, estimator is the model which we are using to make our predictions with, for example it could be defined as GaussianNB(), X and y are the features and label respectively, cv is the cross validation generator, for example KFold(), n_jobs is the parameter that decides if we want to run multiple operations in parallel and train_sizes is the number of training examples that will be considered to generate the curve.

In the following quiz, you will define your learning curve for a model that we have designed for you and you will observe the results.


Noisy Data, Complex Model

Here’s the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# In this exercise we'll examine a learner which has high variance, and tries to learn
# nonexistant patterns in the data.
# Use the learning curve function from sklearn.learning_curve to plot learning curves
# of both training and testing error.
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
from sklearn.learning_curve import learning_curve
from sklearn.cross_validation import KFold
from sklearn.metrics import explained_variance_score, make_scorer
import numpy as np
# Set the learning curve parameters; you'll need this for learning_curves
size = 1000
cv = KFold(size,shuffle=True)
score = make_scorer(explained_variance_score)
# Create a series of data that forces a learner to have high variance
X = np.round(np.reshape(np.random.normal(scale=5,size=2*size),(-1,2)),2)
y = np.array([[np.sin(x[0]+np.sin(x[1]))] for x in X])
def plot_curve():
reg = DecisionTreeRegressor()
reg.fit(X,y)
print "Regressor score: {:.4f}".format(reg.score(X,y))
# TODO: Use learning_curve imported above to create learning curves for both the
# training data and testing data. You'll need 'size', 'cv' and 'score' from above.
training_sizes, training_scores, testing_scores = learning_curve(DecisionTreeRegressor(),X,y, cv=cv, scoring=score)
# TODO: Plot the training curves and the testing curves
# Use plt.plot twice -- one for each score. Be sure to give them labels!
plt.plot(training_sizes,training_scores,'go',label='training_scores')
plt.plot(training_sizes,testing_scores,'rs',label='testing_scores')
# Plot aesthetics
plt.ylim(-0.1, 1.1)
plt.ylabel("Curve Score")
plt.xlabel("Training Points")
plt.legend(bbox_to_anchor=(1.1, 1.1))
plt.show()


Improving the Validity of a Model

There is a trade-off in the value of simplicity or complexity of a model given a fixed set of data. If it is too simple, our model cannot learn about the data and misrepresents the data. However if our model is too complex, we need more data to learn the underlying relationship. Otherwise it is very common for a model to infer relationships that might not actually exist in the data.

The key is to find the sweet spot that minimizes bias and variance by finding the right level of model complexity. Of course with more data any model can improve, and different models may be optimal.

To learn more about bias and variance, we recommend this essay by Scott Fortmann-Roe.

In addition to the subset of data chosen for training, what features you use from a given dataset can also greatly affect the bias and variance of your model.


Bias, Variance, and Number of Features

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/OurfO1ZR2GU.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mpYpT6nZVEo.mp4


Bias, Variance & Number of Features Pt 2

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1lNAvDubBfI.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/X_AS8NBngsk.mp4


Overfitting by Eye

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sJgPnuiHrs8.mp4


Representative Power of a Model

Introduction

Curse of Dimensionality
In this short lesson, we have Charles Isbell, Senior Associate Dean at Georgia Tech School of Computing and Michael Littman, former CS department chair at Rutgers University and current Professor at Brown University teach you about the curse of dimensionality.

These videos are from the OMSCS program at Georgia Tech.


Curse of Dimensionality

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/QZ0DtNFdDko.mp4


Curse of Dimensionality Two

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/OyPcbeiwps8.mp4


MODEL EVALUATION AND VALIDATION PROJECT

Predicting Boston Housing prices.

Overview

Project Overview

In this project, you will apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home. You will first explore the data to obtain important features and descriptive statistics about the dataset. Next, you will properly split the data into testing and training subsets, and determine a suitable performance metric for this problem. You will then analyze performance graphs for a learning algorithm with varying parameters and training set sizes. This will enable you to pick the optimal model that best generalizes for unseen data. Finally, you will test this optimal model on a new sample and compare the predicted selling price to your statistics.

Project Highlights

This project is designed to get you acquainted to working with datasets in Python and applying basic machine learning techniques using NumPy and Scikit-Learn. Before being expected to use many of the available algorithms in the sklearn library, it will be helpful to first practice analyzing and interpreting the performance of your model.

Things you will learn by completing this project:

  • How to use NumPy to investigate the latent features of a dataset.
  • How to analyze various learning performance plots for variance and bias.
  • How to determine the best-guess model for predictions from unseen data.
  • How to evaluate a model’s performance on unseen data using previous data.

Software Requirements

Description

The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you’ve come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for your clients’ homes.

Software and Libraries

This project uses the following software and Python libraries:

If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.


Starting the Project

For this assignment, you can find the boston_housing folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

This project contains three files:

  • boston_housing.ipynb: This is the main file where you will be performing your work on the project.
  • housing.csv: The project dataset. You’ll load this data in the notebook.
  • visuals.py: This Python script contains helper functions that create the necessary visualizations.

In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook boston_housing.ipynb to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter notebook or ipython notebook and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.


Submitting the Project

Evaluation

Your project will be reviewed by a Udacity reviewer against the Predicting Boston Housing Prices project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named boston_housing for ease of access:

  • The boston_housing.ipynb notebook file with all questions answered and all code cells executed and displaying output.
  • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.

Once you have collected these files and reviewed the project rubric, proceed to the project submission page.


Submission

Predicting Boston Housing Prices

The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you’ve come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for your clients’ homes.

Project Files

For this assignment, you can find the boston_housing folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

Evaluation

Your project will be reviewed by a Udacity reviewer against the Predicting Boston Housing Prices project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named boston_housing for ease of access:

  • The boston_housing.ipynb notebook file with all questions answered and all code cells executed and displaying output.
  • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!

When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.

What’s Next?

You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!

PROJECT

  • Predicting Boston Housing Prices project rubric
  • open jupyterwithWindows + rtypecd <path>``<path>
  • Question 9
    Do not forget to modify the parameter
    cv_sets = ShuffleSplit(X.shape[0], n_iter = 10, test_size = 0.20, random_state = 0)to fit 0.17 version

Project modification


Kaggle(stay tuned)

Career: Job Search Strategies

Opportunity can come when you least expect it, so when your dream job comes along, you want to be ready!

After completing these lessons, be sure to complete the Cover Letter Review project and 1 of the 3 Resume Review projects.

If you are a Nanodegree Plus student, Career Content and Career Development Projects are required for graduation.

If you are enrolled in a standard Nanodegree program, Career Content and Career Development Projects are optional and do not affect your graduation.

JOB SEARCH STRATEGIES

Build Your Resume

Cover Letter

Resume Review (Entry-level)

Resume Review (Career Change)

Resume Review (Prior Industry Experience)

Cover Letter Review

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/axcFtHK6If4.mp4


NVIDIA

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/C6Rt9lxMqHs.mp4


Job Search Mindset

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cBk7bno3KS0.mp4


Target Your Application to An Employer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/X9JBzbrkcvs.mp4


Open Yourself Up to Opportunity

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1OamTNkk1xM.mp4


Refine Your Resume

Convey Your Skills Concisely

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xnQr3ohml9s.mp4


Effective Resume Components

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AiFcaHRGdEA.mp4


Resume Structure

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/POM0MqLTj98.mp4


Describe Your Work Experiences

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B1LED4txinI.mp4
Description bullet points should convey:

  • Action
  • Numbers
  • Success

Resume Reflection

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8Cj_tCp8mls.mp4


Resume Review

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/L3F2BFGYMtI.mp4
Types of Resume Formats
Resume formats can be split into three categories, depending on the job candidate:

  • Entry-level (about 0-3 years of experience)
  • Career Change (3+ years of work experience, looking to change career paths)
  • Prior Industry Experience (3+ years of work experience; looking to level up in their career path by upskilling)

Build Your Resume

Resumes are required in job applications and recruiting. The most effective resumes are ones aimed at a specific job. In this project, you will find a job posting and target your resume to those job duties and requirements. Once you complete this project, you’ve successfully applied targeted job search strategies and are ready to look for work!

Receive a review of your resume that is tailored to your level of experience. The format and organization of your resume will vary dependent on your experience in the field. To ensure you’re highlighting your most relevant skills, you will submit your resume to 1 of 3 review projects that match your experience level best.

  • Entry-level: For entry-level job applicants with 0-3 years experience in the field. Best suited for applicants who have recently graduated from their formal education and have limited work experience.
  • Career Change: For those seeking a career change with 3+ years experience in an unrelated field. For example, if you’re a teacher looking for work as a data analyst, or even from project management to front-end development.
  • Prior Industry Experience: For applicants with 3+ years of prior experience in a related field. This would include those with experience in software development looking for work in mobile development, or even from data science to machine learning.
Project Resources
  1. Project Rubrics: Your project will be reviewed by a Udacity Career Reviewer against these rubrics.
  2. Project Checklists: Based on the project rubric, this is a handy checklist to use during your resume building.
  3. Career Resource Center: Find additional tips and guides on developing your resume.
Resume Template Options
* Build your own! This will ensure your resume is unique.
* [Resume Genius: Resume Templates](https://resumegenius.com/resume-templates)
* [Resume Builder](https://www.livecareer.com/resume-builder)
Tips for Bullet Points

Submit Your Resume for Review

Submission Instructions
  1. Find a job posting that you would apply to now or after your Nanodegree graduation. Judge if you would be a good fit for the role. (Note: If you’re more than 75% qualified for the job on paper, you’re probably a good candidate and should give applying a shot!)
  2. Refine your resume to target it to that job posting.
  3. Copy and paste, or link, the job posting in “Notes to reviewer” during submission.
  4. Optional: Remove any sensitive information, such as your phone number, from the submission.
  5. Submit your targeted resume as a .pdf to one of the following project submission pages dependent on your experience:
Share your Resume with Udacity Hiring Partners

Udacity partners with employers, who are able to contact Udacity students and alumni via your Professional Profile. Once you’ve completed the resume review project, make sure to upload your reviewed resume to your Profile!


####

Supervised Learning

Learn how Supervised Learning models such as Decision Trees, SVMs, Neural Networks, etc. are trained to model and predict labeled data.

Project: Finding Donors for CharityML

For most students, this project takes approximately 8 - 21 hours to complete (about 1 - 3 weeks).
17 LESSONS, 1 PROJECT

P2 Finding Donors for CharityML

SUPERVISED LEARNING TASKS

DECISION TREES

ID3(stay tuned)

ARTIFICIAL NEURAL NETWORKS

SUPPORT VECTOR MACHINES

NONPARAMETRIC MODELS

BAYESIAN METHODS

ENSEMBLE OF LEARNERS

INTRODUCTION TO SUPERVISED LEARNING

Supervised Learning Intro

Supervised Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Vc6KuGcfVPM.mp4


What You’ll Watch and Learn

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/zoN8QJYFka4.mp4


ML in The Google Self-Driving Car

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lL16AQItG1g.mp4


Supervised Learning What You’ll Do

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jgYqzo7UFsU.mp4


Acerous Vs. Non-Acerous

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TeFF9wXiFfs.mp4


Supervised Classification Example

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/buxApBhZCO0.mp4


Features and Labels Musical Example

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rnv0-lG9yKU.mp4


Features Visualization Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/t0iflCpBUDA.mp4


Classification By Eye

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xeMDpSRTLWc.mp4


Introduction to Regression

More Regressions

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_CznJ6phPsg.mp4


Parametric regression

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6EC1w_fs5u8.mp4


K nearest neighbor

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/CWCLQ6eu2Do.mp4


How to predict

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/go7ITLl79h8.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/r8PsDjf9scc.mp4


Kernel regression

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ZhJTGBbR18o.mp4


Parametric vs non parametric

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wKT8Ztzt6r0.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PVOWHYJV8P4.mp4


Which problems are regression?


Are Polynomials Linear?


Regressions in sklearn

Continuous Output Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/udJvijJvs1M.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/FOwEL4S-SVo.mp4


Continuous Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Bp6oBbLw8qE.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IC-fo_A0PxQ.mp4


DECISION TREES

Decision Trees

Difference between Classification and Regression

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/i04Pfrb71vk.mp4


More Decision Tree

Linearly Separable Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lCWGV6ZuXt0.mp4


Multiple Linear Questions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/t1Y-nzgI1L4.mp4


ARTIFICIAL NEURAL NETWORKS

Neural Networks

Neural Networks

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/L_6idb3ZXB0.mp4


Neural Nets Mini-Project

Introduction

This section has a mix of coding assignments, multiple choice questions and fill in the blank type questions.

Please do check the instructor notes as we have included relevant forum posts that will help you work through these problems. You can find the instructor notes below the text/video nodes in the classroom.


Build a Perceptron

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# ----------
#
# In this exercise, you will put the finishing touches on a perceptron class.
#
# Finish writing the activate() method by using np.dot to compute signal
# strength and then add in a threshold for perceptron activation.
#
# ----------
import numpy as np
class Perceptron:
"""
This class models an artificial neuron with step activation function.
"""
def __init__(self, weights = np.array([1]), threshold = 0):
"""
Initialize weights and threshold based on input arguments. Note that no
type-checking is being performed here for simplicity.
"""
self.weights = weights
self.threshold = threshold
def activate(self,inputs):
"""
Takes in @param inputs, a list of numbers equal to length of weights.
@return the output of a threshold perceptron with given inputs based on
perceptron weights and threshold.
"""
# INSERT YOUR CODE HERE
# TODO: calculate the strength with which the perceptron fires
strength = np.dot(inputs,self.weights)
# TODO: return 0 or 1 based on the threshold
if strength>self.threshold:
result = 1
else:
result = 0
return result
def test():
"""
A few tests to make sure that the perceptron class performs as expected.
Nothing should show up in the output if all the assertions pass.
"""
p1 = Perceptron(np.array([1, 2]), 0.)
assert p1.activate(np.array([ 1,-1])) == 0 # < threshold --> 0
assert p1.activate(np.array([-1, 1])) == 1 # > threshold --> 1
assert p1.activate(np.array([ 2,-1])) == 0 # on threshold --> 0
if __name__ == "__main__":
test()

Here is relevant forum post for this quiz.

Note that here, and the rest of the mini-project, that signal strength equal to the threshold results in a 0 being output (rather than 1).

It is required that the dot product be strictly greater than the threshold, rather than greater than or equal to the threshold, to pass the assertion tests.


Threshold Meditation


Where to train Perceptrons


Perceptron Inputs


Neural Net Outputs


Perceptron Update Rule

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# ----------
#
# In this exercise, you will update the perceptron class so that it can update
# its weights.
#
# Finish writing the update() method so that it updates the weights according
# to the perceptron update rule. Updates should be performed online, revising
# the weights after each data point.
#
# ----------
import numpy as np
class Perceptron:
"""
This class models an artificial neuron with step activation function.
"""
def __init__(self, weights = np.array([1]), threshold = 0):
"""
Initialize weights and threshold based on input arguments. Note that no
type-checking is being performed here for simplicity.
"""
self.weights = weights.astype(float)
self.threshold = threshold
def activate(self, values):
"""
Takes in @param values, a list of numbers equal to length of weights.
@return the output of a threshold perceptron with given inputs based on
perceptron weights and threshold.
"""
# First calculate the strength with which the perceptron fires
strength = np.dot(values,self.weights)
# Then return 0 or 1 depending on strength compared to threshold
return int(strength > self.threshold)
def update(self, values, train, eta=.1):
"""
Takes in a 2D array @param values consisting of a LIST of inputs and a
1D array @param train, consisting of a corresponding list of expected
outputs. Updates internal weights according to the perceptron training
rule using these values and an optional learning rate, @param eta.
"""
# YOUR CODE HERE
#self.weights = np.transpose(np.zeros(values.shape))
# TODO: for each data point...
for i in range(len(values)):
# TODO: obtain the neuron's prediction for that point
prediction = self.activate(values[i])
# TODO: update self.weights based on prediction accuracy, learning
# rate and input value
self.weights += eta * (train[i] - prediction) * values[i]
print values
print self.weights
def test():
"""
A few tests to make sure that the perceptron class performs as expected.
Nothing should show up in the output if all the assertions pass.
"""
def sum_almost_equal(array1, array2, tol = 1e-6):
return sum(abs(array1 - array2)) < tol
p1 = Perceptron(np.array([1,1,1]),0)
p1.update(np.array([[2,0,-3]]), np.array([1]))
assert sum_almost_equal(p1.weights, np.array([1.2, 1, 0.7]))
p2 = Perceptron(np.array([1,2,3]),0)
p2.update(np.array([[3,2,1],[4,0,-1]]),np.array([0,0]))
assert sum_almost_equal(p2.weights, np.array([0.7, 1.8, 2.9]))
p3 = Perceptron(np.array([3,0,2]),0)
p3.update(np.array([[2,-2,4],[-1,-3,2],[0,2,1]]),np.array([0,1,0]))
assert sum_almost_equal(p3.weights, np.array([2.7, -0.3, 1.7]))
if __name__ == "__main__":
test()

This is relevant forum post for this quiz.


Layered Network Example


Linear Representational Power


Activation Function Quiz


Perceptron Vs Sigmoid


Sigmoid Learning


Gradient Descent Issues


SUPPORT VECTOR MACHINES

Math behind SVMs

Introduction

In this lesson, Charles and Mike will walk you through the Math behind Support Vector Machines. If you would like to jump straight to the higher level concepts and start coding it up using scikit-learn, you can head to the next lesson where Sebastian and Katie will walk you through everything you will need to get up and running with working SVM model.


The Best Line

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5yzSv4jYMyI.mp4


SVMs in Practice

Welcome to SVM

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gnAmmyQ_ZcQ.mp4


Separating Line

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mzKPXz-Yhwk.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/NTm_mA4akP4.mp4


NONPARAMETRIC MODELS

Instance Based Learning

Instance Based Learning Before

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ZTjot416e-g.mp4


BAYESIAN METHODS

Naive Bayes

Speed Scatterplot: Grade and Bumpiness

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IMWsjjIeOrY.mp4


Bayesian Learning

Intro

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PfJoHBLjkR8.mp4


Bayes Rule

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kv38EnIXSkY.mp4


Bayesian Inference

Intro

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AL9LH06uztM.mp4


Joint Distribution

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/RN7drTE2_oI.mp4


Bayes NLP Mini-Project

Bad Handwriting Exposition

Imagine your boss has left you a message from a location with terrible reception. Several words are impossible to hear. Based on some transcriptions of previous messages he’s left, you want to fill in the remaining words. To do this, we will use Bayes’ Rule to find the probability that a given word is in the blank, given some other information about the message.

Recall Bayes Rule:

P(A|B) = P(B|A)*P(A)/P(B)

Or in our case

P(a certain word|surrounding words) = P(surrounding words|a certain word)*P(a certain word) / P(surrounding words)


Calculations


Maximum Likelihood

Here’s the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
sample_memo = '''
Milt, we're gonna need to go ahead and move you downstairs into storage B. We have some new people coming in, and we need all the space we can get. So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?
Oh, and remember: next Friday... is Hawaiian shirt day. So, you know, if you want to, go ahead and wear a Hawaiian shirt and jeans.
Oh, oh, and I almost forgot. Ahh, I'm also gonna need you to go ahead and come in on Sunday, too...
Hello Peter, whats happening? Ummm, I'm gonna need you to go ahead and come in tomorrow. So if you could be here around 9 that would be great, mmmk... oh oh! and I almost forgot ahh, I'm also gonna need you to go ahead and come in on Sunday too, kay. We ahh lost some people this week and ah, we sorta need to play catch up.
'''
#
# Maximum Likelihood Hypothesis
#
#
# In this quiz we will find the maximum likelihood word based on the preceding word
#
# Fill in the NextWordProbability procedure so that it takes in sample text and a word,
# and returns a dictionary with keys the set of words that come after, whose values are
# the number of times the key comes after that word.
#
# Just use .split() to split the sample_memo text into words separated by spaces.
def NextWordProbability(sampletext,word):
wordlist = sampletext.split()
if word in wordlist:
indecies = [i for i,x in enumerate(wordlist) if x == word]
else:
pass
indecies_after = [i+1 for i in indecies]
newwordlist = [wordlist[i] for i in indecies_after]
wordcount = {}
for word in newwordlist:
if word in wordcount:
wordcount[word] += 1
else:
wordcount[word] = 1
return wordcount


NLP Disclaimer

In the previous exercise, you may have thought of some ways we might want to clean up the text available to us.

For example, we would certainly want to remove punctuation, and generally want to make all strings lowercase for consistency. In most language processing tasks we will have a much larger corpus of data, and will want to remove certain features.

Overall, just keep in mind that this mini-project is about Bayesian probability. If you’re interested in the details of language processing, you might start with this Kaggle project, which introduces a more detailed and standard approach to text processing very different from what we cover here.


Optimal Classifier Example


Optimal Classifier Exercise

Here’s the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#------------------------------------------------------------------
#
# Bayes Optimal Classifier
#
# In this quiz we will compute the optimal label for a second missing word in a row
# based on the possible words that could be in the first blank
#
# Finish the procedurce, LaterWords(), below
#
# You may want to import your code from the previous programming exercise!
#
sample_memo = '''
Milt, we're gonna need to go ahead and move you downstairs into storage B. We have some new people coming in, and we need all the space we can get. So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?
Oh, and remember: next Friday... is Hawaiian shirt day. So, you know, if you want to, go ahead and wear a Hawaiian shirt and jeans.
Oh, oh, and I almost forgot. Ahh, I'm also gonna need you to go ahead and come in on Sunday, too...
Hello Peter, whats happening? Ummm, I'm gonna need you to go ahead and come in tomorrow. So if you could be here around 9 that would be great, mmmk... oh oh! and I almost forgot ahh, I'm also gonna need you to go ahead and come in on Sunday too, kay. We ahh lost some people this week and ah, we sorta need to play catch up.
'''
corrupted_memo = '''
Yeah, I'm gonna --- you to go ahead --- --- complain about this. Oh, and if you could --- --- and sit at the kids' table, that'd be ---
'''
data_list = sample_memo.strip().split()
words_to_guess = ["ahead","could"]
def LaterWords(sample,word,distance):
'''@param sample: a sample of text to draw from
@param word: a word occuring before a corrupted sequence
@param distance: how many words later to estimate (i.e. 1 for the next word, 2 for the word after that)
@returns: a single word which is the most likely possibility
'''
# TODO: Given a word, collect the relative probabilities of possible following words
# from @sample. You may want to import your code from the maximum likelihood exercise.
# TODO: Repeat the above process--for each distance beyond 1, evaluate the words that
# might come after each word, and combine them weighting by relative probability
# into an estimate of what might appear next.
return {}
print LaterWords(sample_memo,"ahead",2)


Which Words Meditation


Joint Distribution Analysis


Domain Knowledge Quiz


Domain Knowledge Fill In


ENSEMBLE OF LEARNERS

Ensemble B&B

Ensemble Learning Boosting

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/w75WyRjRpAg.mp4


Back to Boosting

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PHBd2glewzM.mp4


Boosting Tends to Overfit

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/UHxYXwvjH5c.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Hp4gJjSFSYc.mp4


SUPERVISED LEARNING PROJECT

Finding donors for CharityML

Overview

Project Overview

In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you’ve selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it’s performing when considering the data it’s given.

Project Highlights

This project is designed to get you acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data. It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided.

Things you will learn by completing this project:

  • How to identify when preprocessing is needed, and how to apply it.
  • How to establish a benchmark for a solution to the problem.
  • What each of several supervised learning algorithms accomplishes given a specific dataset.
  • How to investigate whether a candidate solution model is adequate for the problem.

Software Requirements

Description

CharityML is a fictitious charity organization located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning. After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually. To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought you on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.

Software and Libraries

This project uses the following software and Python libraries:

Python 2.7
NumPy
pandas
scikit-learn (v0.17)
matplotlib
You will also need to have software installed to run and execute a Jupyter Notebook.

If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.


Starting the Project

For this assignment, you can find the finding_donors folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

This project contains three files:

  • finding_donors.ipynb: This is the main file where you will be performing your work on the project.
  • census.csv: The project dataset. You?ll load this data in the notebook.
  • visuals.py: This Python script provides supplementary visualizations for the project. Do not modify.

In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook finding_donors.ipynb to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter notebook or ipython notebook and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.


Submitting the Project

Evaluation

Your project will be reviewed by a Udacity reviewer against the Finding Donors for CharityML project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named finding_donors for ease of access:

  • The finding_donors.ipynb notebook file with all questions answered and all code cells executed and displaying output.
  • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
    Once you have collected these files and reviewed the project rubric, proceed to the project submission page.

Submission

Finding Donors for CharityML

CharityML is a fictitious charity organization located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning. After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually. To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought you on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.

Project Files

For this assignment, you can find the finding_donors folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

Evaluation

Your project will be reviewed by a Udacity reviewer against the Finding Donors for CharityML project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named finding_donors for ease of access:

  • The finding_donors.ipynb notebook file with all questions answered and all code cells executed and displaying output.
  • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
    I’m Ready!
    When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.

What’s Next?

You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!


PROJECT


Unsupervised Learning

Learn how to find patterns and structures in unlabeled data, perform feature transformations and improve the predictive performance of your models.
Project: Creating Customer Segments
For most students, this project takes approximately 10 - 15 hours to complete (about 1 - 2 weeks).
P3 Creating Customer Segments

CLUSTERING

Introduction to Unsupervised Learning

Unsupervised Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8oZpT6Hekhk.mp4


What You’ll Watch and Learn

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1a68kAJAgIU.mp4


Clustering

Unsupervised Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Mx9f99bRB3Q.mp4


Clustering Movies

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/g8PKffm8IRY.mp4


More Clustering

Single Linkage Clustering

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HfikjFVM3dg.mp4

Quiz: Single Linkage Clustering
Please use a comma to separate the two objects that will be linked in your answer. For instance, to describe a link from a to b, write “a,b” as your answer in the box.



https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/vytc9CsjjAs.mp4


Single Linkage Clustering Two

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/aojgUed9M0w.mp4


Clustering Mini-Project

Clustering Mini-Project Video

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/68EGMItJiNM.mp4


K-Means Clustering Mini-Project

In this project, we’ll apply k-means clustering to our Enron financial data. Our final goal, of course, is to identify persons of interest; since we have labeled data, this is not a question that particularly calls for an unsupervised approach like k-means clustering.

Nonetheless, you’ll get some hands-on practice with k-means in this project, and play around with feature scaling, which will give you a sneak preview of the next lesson’s material.
The Enron dataset can be found here.


Clustering Features

The starter code can be found in k_means/k_means_cluster.py, which reads in the email + financial (E+F) dataset and gets us ready for clustering. You’ll start with performing k-means based on just two financial features–take a look at the code, and determine which features the code uses for clustering.

Run the code, which will create a scatterplot of the data. Think a little bit about what clusters you would expect to arise if 2 clusters are created.


Deploying Clustering

Deploy k-means clustering on the financial_features data, with 2 clusters specified as a parameter. Store your cluster predictions to a list called pred, so that the Draw() command at the bottom of the script works properly. In the scatterplot that pops up, are the clusters what you expected?


FEATURE ENGINEERING

Feature Scaling

Chris’s T-Shirt Size (Intuition)

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oaqjLyiKOIA.mp4


A Metric for Chris

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/O0bvLU4l0is.mp4


Feature Selection

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/UAMwTr3cnok.mp4


Feature Selection

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8CpRLplmdqE.mp4


DIMENSIONALITY REDUCTION

PCA

Data Dimensionality

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gg7SAMMl4kM.mp4


Trickier Data Dimensionality

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-dcNhrSPmoY.mp4


PCA Mini-Project

PCA Mini-Project Intro

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rR68JXwKBxE.mp4


PCA Mini-Project

Our discussion of PCA spent a lot of time on theoretical issues, so in this mini-project we’ll ask you to play around with some sklearn code. The eigenfaces code is interesting and rich enough to serve as the testbed for this entire mini-project.

The starter code can be found in pca/eigenfaces.py. This was mostly taken from the example found here, on the sklearn documentation.
Take note when running the code, that there are changes in one of the parameters for the SVC function called on line 94 of pca/eigenfaces.py. For the ‘class_weight’ parameter, the argument string “auto” is a valid value for sklearn version 0.16 and prior, but will be depreciated by 0.19. If you are running sklearn version 0.17 or later, the expected argument string should be “balanced”. If you get an error or warning when running pca/eigenfaces.py, make sure that you have the correct argument on line 98 that matches your installed version of sklearn.


Feature Transformation

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/J9JsMNownYM.mp4


Feature Transformation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B6mPphwAXZk.mp4


Summary

What we have learned

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/74oyGTdFp0Y.mp4


UNSUPERVISED LEARNING PROJECT

Things you will learn by completing this project:

  • How to apply preprocessing techniques such as feature scaling and outlier detection.
  • How to interpret data points that have been scaled, transformed, or reduced from PCA.
  • How to analyze PCA dimensions and construct a new feature space.
  • How to optimally cluster a set of data to find hidden patterns in a dataset.
  • How to assess information given by cluster data and use it in a meaningful way.

jupyter notebook customer_segments.ipynb

Creating Customer Segments project rubric

submit project

review

second review

seaborn.heatmap()

git project:

  1. opengit bashand tpyecd <path>with\before space key and/between directory
  2. git init git status git add <> git commmit -m "description"
  3. create a repo in github
  4. git remote add origin URL git push origin master
  5. git push origin master
  6. git add <>``git commit -m "second submit"``git push origin master

Identifying customers by clustering them.

Overview

Project Overview

In this project you will apply unsupervised learning techniques on product spending data collected for customers of a wholesale distributor in Lisbon, Portugal to identify customer segments hidden in the data. You will first explore the data by selecting a small subset to sample and determine if any product categories highly correlate with one another. Afterwards, you will preprocess the data by scaling each product category and then identifying (and removing) unwanted outliers. With the good, clean customer spending data, you will apply PCA transformations to the data and implement clustering algorithms to segment the transformed customer data. Finally, you will compare the segmentation found with an additional labeling and consider ways this information could assist the wholesale distributor with future service changes.

Project Highlights

This project is designed to give you a hands-on experience with unsupervised learning and work towards developing conclusions for a potential client on a real-world dataset. Many companies today collect vast amounts of data on customers and clientele, and have a strong desire to understand the meaningful relationships hidden in their customer base. Being equipped with this information can assist a company engineer future products and services that best satisfy the demands or needs of their customers.

Things you will learn by completing this project:

  • How to apply preprocessing techniques such as feature scaling and outlier detection.
  • How to interpret data points that have been scaled, transformed, or reduced from PCA.
  • How to analyze PCA dimensions and construct a new feature space.
  • How to optimally cluster a set of data to find hidden patterns in a dataset.
  • How to assess information given by cluster data and use it in a meaningful way.

Software Requirements

Description

A wholesale distributor recently tested a change to their delivery method for some customers, by moving from a morning delivery service five days a week to a cheaper evening delivery service three days a week. Initial testing did not discover any significant unsatisfactory results, so they implemented the cheaper option for all customers. Almost immediately, the distributor began getting complaints about the delivery service change and customers were canceling deliveries — losing the distributor more money than what was being saved. You’ve been hired by the wholesale distributor to find what types of customers they have to help them make better, more informed business decisions in the future. Your task is to use unsupervised learning techniques to see if any similarities exist between customers, and how to best segment customers into distinct categories.

Software and Libraries

This project uses the following software and Python libraries:

Python 2.7
NumPy
pandas
scikit-learn (v0.17)
matplotlib

You will also need to have software installed to run and execute a Jupyter Notebook.

If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.


Starting the Project

For this assignment, you can find the customer_segments folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

This project contains three files:

  • customer_segments.ipynb: This is the main file where you will be performing your work on the project.
  • customers.csv: The project dataset. You’ll load this data in the notebook.
  • visuals.py: This Python script provides supplementary visualizations for the project. Do not modify.

In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook customer_segments.ipynb to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter notebook or ipython notebook and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.


Submitting the Project

Evaluation

Your project will be reviewed by a Udacity reviewer against the Creating Customer Segments project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named customer_segments for ease of access:

  • The customer_segments.ipynb notebook file with all questions answered and all code cells executed and displaying output.
  • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.

Once you have collected these files and reviewed the project rubric, proceed to the project submission page.


Submission

Creating Customer Segments

A wholesale distributor recently tested a change to their delivery method for some customers, by moving from a morning delivery service five days a week to a cheaper evening delivery service three days a week.Initial testing did not discover any significant unsatisfactory results, so they implemented the cheaper option for all customers. Almost immediately, the distributor began getting complaints about the delivery service change and customers were canceling deliveries — losing the distributor more money than what was being saved. You’ve been hired by the wholesale distributor to find what types of customers they have to help them make better, more informed business decisions in the future. Your task is to use unsupervised learning techniques to see if any similarities exist between customers, and how to best segment customers into distinct categories.

Project Files

For this assignment, you can find the customer_segments folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

Evaluation

Your project will be reviewed by a Udacity reviewer against the Creating Customer Segments project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named customer_segments for ease of access:

  • The customer_segments.ipynb notebook file with all questions answered and all code cells executed and displaying output.
  • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!

When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.

What’s Next?

You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!

Supporting Materials

Videos Zip File

Career: Networking

In the following lesson, you will learn how tell your unique story to recruiters in a succinct and professional but relatable way.

After completing these lessons, be sure to complete the online profile review projects, such as LinkedIn Profile Review.

If you are a Nanodegree Plus student, Career Content and Career Development Projects are required for graduation.

If you are enrolled in a standard Nanodegree program, Career Content and Career Development Projects are optional and do not affect your graduation.

NETWORKING

Develop Your Personal Brand

Why Network?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/exjEm9Paszk.mp4


Elevator Pitch

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/S-nAHPrkQrQ.mp4


Personal Branding

How to Stand Out

Imagine you’re a hiring manager for a company, and you need to pick 5 people to interview for a role. But you get 50 applications, and everyone seems pretty qualified. How do you compare job candidates? You’ll probably pick the candidates that stand out the most to you.

Personal Stories

The thing that always makes a job candidate unique is their personal story - their passion and how they got there. Employers aren’t just looking for someone with the skills, but they’re looking for someone who can drive the company’s mission and will be a part of innovation. That’s why they need to know your work ethic and what drives you.

As someone wanting to impress an employer, you need to tell your personal story. You want employers to know how you solve problems, overcome challenges, achieve results. You want employers to know what excites you, what motivates you, what drives you forward.

All of this can be achieved through effective storytelling, and effective branding.

I’ll let you know I’ve branded and rebranded myself many times. That’s okay - people are complex and have multiple interests that change over time.

In this next video, we’ll meet my coworker Chris who will show us how he used personal branding to help him in his recent career change.
Resources
Blog post: Storytelling, Personal Branding, and Getting Hired


Meet Chris

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0ccflD9x5WU.mp4

Resources
Blog post: Overcome Imposter Syndrome


Elevator Pitch

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0QtgTG49E9I.mp4


Pitching to a Recruiter

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/LxAdWaA-qTQ.mp4


Use Your Elevator Pitch

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/e-v60ieggSs.mp4


Optimize Your LinkedIn Profile

Why LinkedIn

LinkedIn is the most popular professional networking platform out there, so most recruiters use it to find job seekers. It’s so common for hiring teams to use LinkedIn to find and look at candidates, that it’s almost a red flag if they’re unable to find a LinkedIn profile for you.

It’s also a great platform for you to connect with other people in your field. Udacity for example has an Alumni LinkedIn group where graduates can collaborate on projects, practice job interviews, or discuss new trends in the industry together. Connecting with a fellow alum and asking for a referral would increase your chances of getting an interview.

Find Connections

The best way to use your LinkedIn effectively, however, is to have over 500 connections.

This may seem like a lot, but once you get rolling, you’ll get to that number fast. After you actively start using it it, by joining groups and going to networking events, your number of connections will climb. You are more likely to show up in search results on LinkedIn if you have more connections, which means you’ll be more visible to recruiters.

Join Groups

Increasing the group of people you’re connected with also exposes you to what they’re working on or have done. For example, if you move to a new city, you can search your network to see who lives in the area, and ask for recommendations on apartment hunting, job leads, or other advice on adjusting to life in another city.

Also, if you’re active in a LinkedIn group or if you frequently write LinkedIn blog posts, you’ll increase your visibility on the platform and likelihood that a recruiter will find your profile.

How to Build Your LinkedIn Profile

LinkedIn guides you well when filling out your profile. It tells you if your profile is strong and offers recommendations on how to improve it. We recommend you follow LinkedIn’s advice because it’ll increase your visibility on the network, thus increasing the number of opportunities you may come across.

Tips for an Awesome LinkedIn Profile

In the lessons on conducting a successful job search and resume writing, we talk about how you can describe your work experiences in a way that targets a specific job.

Use what you learn to describe your experiences in LinkedIn’s projects and work sections. You can even copy and paste over the bullet points in your resume to the work or project sections of LinkedIn. Making sure your resume and LinkedIn are consistent helps build your personal brand.

Find Other Networking Platforms

Remember that LinkedIn isn’t the only professional networking platform out there. If you do have a great LinkedIn profile, that means you can also build an amazing profile on other platforms. Find some recommendations for online profiles on the Career Resource Center.

Up Next

By now, you know how to target your job profile to your dream job. You know how to market yourself effectively through building off your elevator pitch. Being confident in this will help you network naturally, whether on LinkedIn or at an event in-person.

Move on to the LinkedIn Profile Review and get personalized feedback on your online presence.


Networking Your Way to a New Job
Career and Job Fairs Do’s and Don’ts
What are career mixers?


GitHub Profile Review


LinkedIn Profile Review


Udacity Professional Profile Review

Reinforcement Learning

DUE OCT 19
Use Reinforcement Learning algorithms like Q-Learning to train artificial agents to take optimal actions in an environment.

Project: Train a Smartcab to Drive

For most students, this project takes approximately 15 - 21 hours to complete (about 2 - 3 weeks).
P4 Train a Smartcab to Drive

Markov Decision Processes

  • Further details on this quiz can be found in Chapter 17 of Artificial Intelligence: A Modern Approach

REINFORCEMENT LEARNING

  • Andrew Moore’s slides on Zero-Sum Games
  • Andrew Moore’s slides on Non-Zero-Sum Games
  • This paper offers a summary and an investigation of the field of reinforcement learning. It’s long, but chock-full of information!

PROJECT

Software Requirements

pygame

1
2
3
Mac: conda install -c https://conda.anaconda.org/quasiben pygame
Linux: conda install -c https://conda.anaconda.org/tlatorre pygame
Windows: conda install -c https://conda.anaconda.org/prkrekel pygame

Common Problems with PyGame

Train a Smartcab to Drive project rubric

submit

windows + rtypepip install pygame

review

REINFORCEMENT LEARNING

Introduction to Reinforcement Learning

Reinforcement Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PeAHckcWFS0.mp4


What You’ll Watch and Learn

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Z6ATPu4b9nc.mp4


Reinforcement Learning What You’ll Do

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1vQQphPLnkM.mp4


Markov Decision processes

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_ocNerSvh5Y.mp4


Reinforcement Learning

Reinforcement Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HeYSFWPX_4k.mp4


Rat Dinosaurs

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/h7ExhVneBDU.mp4


GAME THEORY

Game Theory

Game Theory

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/vYHk1SPpnmQ.mp4


What Is Game Theory?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jwlteKFyiHU.mp4


PROJECT

Train a cab to drive itself.

Overview


Software Requirements

Description

In the not-so-distant future, taxicab companies across the United States no longer employ human drivers to operate their fleet of vehicles. Instead, the taxicabs are operated by self-driving agents, known as smartcabs, to transport people from one location to another within the cities those companies operate. In major metropolitan areas, such as Chicago, New York City, and San Francisco, an increasing number of people have come to depend on smartcabs to get to where they need to go as safely and reliably as possible. Although smartcabs have become the transport of choice, concerns have arose that a self-driving agent might not be as safe or reliable as human drivers, particularly when considering city traffic lights and other vehicles. To alleviate these concerns, your task as an employee for a national taxicab company is to use reinforcement learning techniques to construct a demonstration of a smartcab operating in real-time to prove that both safety and reliability can be achieved.

Software Requirements

This project uses the following software and Python libraries:

  • Python 2.7
  • NumPy
  • pandas
  • matplotlib
  • PyGame
    If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer. pygame can then be installed using one of the following commands:

Mac: conda install -c https://conda.anaconda.org/quasiben pygame
Linux: conda install -c https://conda.anaconda.org/tlatorre pygame
Windows: conda install -c https://conda.anaconda.org/prkrekel pygame

Please note that installing pygame can be done using pip as well.

You can run an example to make sure pygame is working before actually performing the project by running:

python -m pygame.examples.aliens


Common Problems with PyGame

Fixing Common PyGame Problems

The PyGame library can in some cases require a bit of troubleshooting to work correctly for this project. While the PyGame aspect of the project is not required for a successful submission (you can complete the project without a visual simulation, although it is more difficult), it is very helpful to have it working! If you encounter an issue with PyGame, first see these helpful links below that are developed by communities of users working with the library:

Problems most often reported by students

“PyGame won’t install on my machine; there was an issue with the installation.”
Solution: As has been recommended for previous projects, Udacity suggests that you are using the Anaconda distribution of Python, which can then allow you to install PyGame through the conda-specific command.

“I’m seeing a black screen when running the code; output says that it can’t load car images.”
Solution: The code will not operate correctly unless it is run from the top-level directory for smartcab. The top-level directory is the one that contains the README and the project notebook.

If you continue to have problems with the project code in regards to PyGame, you can also use the discussion forums to find posts from students that encountered issues that you may be experiencing. Additionally, you can seek help from a swath of students in the MLND Student Slack Community.


Starting the Project

For this assignment, you can find the smartcab folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

This project contains three directories:

  • /logs/: This folder will contain all log files that are given from the simulation when specific prerequisites are met.
  • /images/: This folder contains various images of cars to be used in the graphical user interface. You will not need to modify or create any files in this directory.
  • /smartcab/: This folder contains the Python scripts that create the environment, graphical user interface, the simulation, and the agents. You will not need to modify or create any files in this directory except for agent.py.

It also contains two files:

  • smartcab.ipynb: This is the main file where you will answer questions and provide an analysis for your work. -visuals.py: This Python script provides supplementary visualizations for the analysis. Do not modify.
    Finally, in /smartcab/ are the following four files:

  • Modify:

    • agent.py: This is the main Python file where you will be performing your work on the project.
  • Do not modify:
    • environment.py: This Python file will create the smartcab environment.
    • planner.py: This Python file creates a high-level planner for the agent to follow towards a set goal.
    • simulator.py: This Python file creates the simulation and graphical user interface.
Running the Code

In a terminal or command window, navigate to the top-level project directory smartcab/ (that contains the three project directories) and run one of the following commands:

python smartcab/agent.py or
python -m smartcab.agent

This will run the agent.py file and execute your implemented agent code into the environment. Additionally, use the command jupyter notebook smartcab.ipynbfrom this same directory to open up a browser window or tab to work with your analysis notebook. Alternatively, you can use the command jupyter notebook or ipython notebook and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the implementation necessary for your agent.py agent file. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.


Definitions

Environment

The smartcab operates in an ideal, grid-like city (similar to New York City), with roads going in the North-South and East-West directions. Other vehicles will certainly be present on the road, but there will be no pedestrians to be concerned with. At each intersection there is a traffic light that either allows traffic in the North-South direction or the East-West direction. U.S. Right-of-Way rules apply:

  • On a green light, a left turn is permitted if there is no oncoming traffic making a right turn or coming straight through the intersection.
  • On a red light, a right turn is permitted if no oncoming traffic is approaching from your left through the intersection. To understand how to correctly yield to oncoming traffic when turning left, you may refer to this official drivers’ education video, or this passionate exposition.
Inputs and Outputs

Assume that the smartcab is assigned a route plan based on the passengers’ starting location and destination. The route is split at each intersection into waypoints, and you may assume that the smartcab, at any instant, is at some intersection in the world. Therefore, the next waypoint to the destination, assuming the destination has not already been reached, is one intersection away in one direction (North, South, East, or West). The smartcab has only an egocentric view of the intersection it is at: It can determine the state of the traffic light for its direction of movement, and whether there is a vehicle at the intersection for each of the oncoming directions. For each action, the smartcab may either idle at the intersection, or drive to the next intersection to the left, right, or ahead of it. Finally, each trip has a time to reach the destination which decreases for each action taken (the passengers want to get there quickly). If the allotted time becomes zero before reaching the destination, the trip has failed.

Rewards and Goal

The smartcab will receive positive or negative rewards based on the action it as taken. Expectedly, the smartcab will receive a small positive reward when making a good action, and a varying amount of negative reward dependent on the severity of the traffic violation it would have committed. Based on the rewards and penalties the smartcab receives, the self-driving agent implementation should learn an optimal policy for driving on the city roads while obeying traffic rules, avoiding accidents, and reaching passengers’ destinations in the allotted time.


Submitting the Project

Evaluation

Your project will be reviewed by a Udacity reviewer against the Train a Smartcab to Drive project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named smartcab for ease of access:

  • The agent.py Python file with all code implemented as required in the instructed tasks.
  • The /logs/ folder which should contain five log files that were produced from your simulation and used in the analysis.
  • The smartcab.ipynb notebook file with all questions answered and all visualization cells executed and displaying results.
    • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.

Once you have collected these files and reviewed the project rubric, proceed to the project submission page.


Submission

Train a Smartcab to Drive

In the not-so-distant future, taxicab companies across the United States no longer employ human drivers to operate their fleet of vehicles. Instead, the taxicabs are operated by self-driving agents — known as smartcabs — to transport people from one location to another within the cities those companies operate. In major metropolitan areas, such as Chicago, New York City, and San Francisco, an increasing number of people have come to rely on smartcabs to get to where they need to go as safely and efficiently as possible. Although smartcabs have become the transport of choice, concerns have arose that a self-driving agent might not be as safe or efficient as human drivers, particularly when considering city traffic lights and other vehicles. To alleviate these concerns, your task as an employee for a national taxicab company is to use reinforcement learning techniques to construct a demonstration of a smartcab operating in real-time to prove that both safety and efficiency can be achieved.

Project Files

For this assignment, you can find the smartcab folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

Evaluation

Your project will be reviewed by a Udacity reviewer against the Train a Smartcab to Drive project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named smartcab for ease of access:

  • Theagent.py Python file with all code implemented as required in the instructed tasks.
  • The /logs/ folder which should contain five log files that were produced from your simulation and used in the analysis.
  • The smartcab.ipynb notebook file with all questions answered and all visualization cells executed and displaying results.
    • An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!

When you’re ready to submit your project, click on the Submit Project button at the bottom of this page.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.

What’s Next?

You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!

Supporting Materials

Videos Zip File


View Submission

submission


Deep Learning

P5 Build a Digit Recognition Program

FROM MACHINE LEARNING TO DEEP LEARNING

SOFTWARE AND TOOLS

TensorFlow

Download and Setup

Method 1: Pre-built Docker container with TensorFlow and all assignments
To get started with TensorFlow quickly and work on your assignments, follow the instructions in this README.
Note: If you are on a Windows machine, Method 1 is your only option due to lack of native TensorFlow support.

(not needed) Check your GPU

right click computer->property->设备管理器->显示适配器
I use the CPU only method

(failed) First try from discussion at Udacity
  • Install Docker Toolbox (you can get it here). I recommend installing every optional package. ->failed
  • Create a virtual machine for your udacity tensorflow work:
    docker-machine create -d virtualbox --virtualbox-memory 2048 tensorflow
  • In a cmd.exe prompt, run
    FOR /f "tokens=*" %i IN ('docker-machine env --shell cmd tensorflow') DO %i
  • Next, run
    docker run -p 8888:8888 --name tensorflow-udacity -it b.gcr.io/tensorflow-udacity/assignments:0.5.0
  • In a browser, go to
    http://192.168.99.100:8888/tree
(failed) Second try

I have 2 versions in python, so I will not use this one.

(failed) Third try from discussion at Udacity

windows + r

ohe = preprocessing.OneHotEncoder() # creating OneHotEncoder object
label_encoded_data = label_encoder.fit_transform(data['health'])
ohe.fit_transform(label_encoded_data.reshape(-1,1))

After executing the above steps, I can use tensorflow by selecting the following option in Jupyter notebook: Kernel => Change kernel => python [conda env:py35]

Note: I used python 2.7 and jupyter notebook for the earlier assignments.

(Useful) Forth method

Follow this video and install Ubuntu in Virtualbox.

虚拟硬盘文件保存位置C:\Users\SSQ\VirtualBox VMs\Deep Learning Ubuntu\Deep Learning Ubuntu.vdi

location of shared file C:\Users\SSQ\virtualbox share

Follow this blog to copy files between host OS and guest OS.
for me I usesudo mount -t vboxsf virtualbox_share /mnt/

Follow this TensorFlow

for mac ox, follow this video
register mega

https://www.tensorflow.org/get_started/os_setup#pip_installation_on_windows

(success) Fifth try with pip install

Follow this website

When I type pip install tensorflow in Virtualbox (OS:Linux),
it always shows ReadTimeoutError: HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out.,
so I choose
sudo pip install --upgrade https://pypi.tuna.tsinghua.edu.cn/packages/7b/c5/a97ed48fcc878e36bb05a3ea700c077360853c0994473a8f6b0ab4c2ddd2/tensorflow-1.0.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=a7483a4da4d70cc628e9e207238f77c0
to install tensorflow

Collecting numpy>=1.11.0 (from tensorflow==1.0.0)

Downloading numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl (16.5MB)

sudo pip install --upgrade https://pypi.python.org/packages/cb/47/19e96945ee6012459e85f87728633f05b1e8791677ae64370d16ac4c849e/numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=9f9bc53d2e281831e1a75be0c09a9548

From this mirror

sudo pip install --upgrade https://mirrors.ustc.edu.cn/pypi/web/packages/cb/47/19e96945ee6012459e85f87728633f05b1e8791677ae64370d16ac4c849e/numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=9f9bc53d2e281831e1a75be0c09a9548

Try again success
pip install --index https://pypi.mirrors.ustc.edu.cn/simple/ tensorflow

Validate your installation

$ python

import tensorflow as tf
hello = tf.constant(‘Hello, TensorFlow!’)
sess = tf.Session()
print(sess.run(hello))

Hello, TensorFlow!

(success) Sixth try with anaconda inatall

Follow this website

Download anaconda in the VirtualBox
for me it shows readtimeouterror

So I decide to download it in my host OS and copy it to my share file C:\Users\SSQ\virtualbox share and I can find it in the /mnt from my Linux system.
type bash /mnt/Anaconda2-4.3.0-Linux-x86_64.sh
type yes
Anaconda2 will now be installed into this location:
/home/ssq/anaconda2

Press ENTER to confirm the location
Press CTRL-C to abort the installation
Or specify a different location below

click Enter

Do you wish the installer to prepend the Anaconda2 install location
to PATH in your /home/ssq/.bashrc ? [yes|no]

yes

Open new terminal and type conda create -n tensorflow

Fetching package metadata …

CondaHTTPError: HTTP None None for url
Elapsed: None

An HTTP error occurred when trying to retrieve this URL.
ConnectionError(ReadTimeoutError(“HTTPSConnectionPool(host=’repo.continuum.io’, port=443): Read timed out.”,),)

Try again conda create -n tensorflow

source activate tensorflow

From ssq@ssq-VirtualBox:~$ to (tensorflow) ssq@ssq-VirtualBox:~$

Success
y

pip install --index https://pypi.mirrors.ustc.edu.cn/simple/ tensorflow

Validate your installation

$ python

import tensorflow as tf
hello = tf.constant(‘Hello, TensorFlow!’)
sess = tf.Session()
print(sess.run(hello))

Hello, TensorFlow!

source deactivate tensorflow

From (tensorflow) ssq@ssq-VirtualBox:~$ to ssq@ssq-VirtualBox:~$

(failed)docker install

Install docker with sudo apt install docker.io

**ubuntu 16.06 GPU tensorflow install
  • mkdir – make directory in linux server
  • rm – remove directory in linux server
  • wget – download files
  • sh – install .sh file
  • install anaconda
    reference to this article
  • install cudnn
  • install CUDA
  • install tensorflow-gpu directly within your , reference to this page

Assignments

Assignments

Note: If you installed TensorFlow using the pre-built Docker container, you do not have to fetch assignment code separately. Just run the container and access the notebooks as mentioned here.

Get Starter Code

Starter code packages (Jupyter notebooks) are available from the main TensorFlow repository. Clone it and navigate to the tensorflow/examples/udacity/ directory.

This contains all the Jupyter notebooks (.ipynb files) as well as a Docker spec (Dockerfile).

Run

Depending on how you installed TensorFlow, do one of the following to run assignment code:

**Pip/virtualenv**: Run `ipython notebook` and open http://localhost:8888 in a browser.
**Docker**: As mentioned in README.md:
    First build a local Docker container: docker build -t $USER/assignments .
    Run the container: docker run -p 8888:8888 -it --rm $USER/assignments
    Now find your VM's IP using docker-machine ip default (say, 192.168.99.100) and open http://192.168.99.100:8888

You should be able to see a list of notebooks, one for each assignment. Click on the appropriate one to open it, and follow the inline instructions.

And you’re ready to start exploring! To get further help on each assignment, navigate to the appropriate node.

If you want to learn more about iPython (or Jupyter) notebooks, visit jupyter.org.

Assignment 1: notMNIST

Assignment 1: notMNIST

Preprocess notMNIST data and train a simple logistic regression model on it

notMNIST dataset samples

Starter Code

Open the iPython notebook for this assignment (1_notmnist.ipynb), and follow the instructions to implement and run each indicated step. Some of the early steps that preprocess the data have been implemented for you.

Evaluation

This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer the posed questions (save your responses as markdown in the notebook).

In the end, you should have a model trained on the notMNIST dataset, which is able to recognize a subset of English letters in different fonts. How accurately does your model predict the correct labels on the test dataset?

Problem 2: Verify normalized images

Note how imshow() displays an image using a color map. You can change this using the cmap parameter. Check out more options in the API reference.

DEEP NEURAL NETWORKS

Deep Neural Networks

Assignment 2: SGD

Assignment 2: Stochastic Gradient Descent

Train a fully-connected network using Gradient Descent and Stochastic Gradient Descent

Note: The assignments in this course build on each other, so please finish Assignment 1 before attempting this.

Starter Code

Open the iPython notebook for this assignment (2_fullyconnected.ipynb), and follow the instructions to implement and/or run each indicated step. Some steps have been implemented for you.

Evaluation

This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).

Your new model should perform better than the one you developed for Assignment 1. Also, the time required to train using Stochastic Gradient Descent (SGD) should be considerably less than simple Gradient Descent (GD).

Errors

Error:

valueError: Only call softmax_cross_entropy_with_logits with named arguments (labels=…, logits=…, …)

Fix:

loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))

Assignment 3: Regularization

Assignment 3: Regularization

Use regularization techniques to improve a deep learning model

Note: The assignments in this course build on each other, so please finish them in order.

Starter Code

Open the iPython notebook for this assignment (3_regularization.ipynb), and follow the instructions to implement and run each indicated step. Some steps have been implemented for you.

Evaluation

This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).

Try to apply the different regularization techniques you have learnt, and compare their results. Which seems to work better? Is one clearly better than the others?

Error in VirtualBox

Error:

1
2
3
4
5
6
7
Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM.
错误 ID:
HostMemoryLow
严重:
非致命性错误

How to fix:
Close all the process in the host OS and free up memory.
Restart VM

CONVOLUTIONAL NEURAL NETWORKS

Readings

Readings
For a closer look at the arithmetic behind convolution, and how it is affected by your choice of padding scheme, stride and other parameters, please refer to this illustrated guide:

V. Dumoulin and F. Visin, A guide to convolution arithmetic for deep learning.

Assignment 4: Convolutional Models

Design and train a Convolutional Neural Network

Note: The assignments in this course build on each other, so please finish them in order.

Starter Code

Open the iPython notebook for this assignment (4_convolutions.ipynb), and follow the instructions to implement and run each indicated step. Some steps have been implemented for you.

Evaluation

This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).

Improve the model by experimenting with its structure - how many layers, how they are connected, stride, pooling, etc. For more efficient training, try applying techniques such as dropout and learning rate decay. What does your final architecture look like?

DEEP MODELS FOR TEXT AND SEQUENCES

tSNE

Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 2008. Vol. 9, pp. 2579-2605.

Assignment 5: Word2Vec and CBOW

Assignment 5: Word2Vec and CBOW

Train a skip-gram model on Text8 data and visualize the output

Note: The assignments in this course build on each other, so please finish them in order.

Starter Code

Open the iPython notebook for this assignment (5_word2vec.ipynb), and follow the instructions to implement and run each indicated step. The first model (Word2Vec) has been implemented for you. Using that as a reference, train a CBOW (Continuous Bag of Words) model.

Evaluation

This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).

How does your CBOW model perform compared to the given Word2Vec model?

Open

sudo mount -t vboxsf virtualbox_share /mnt/
jupyter notebook

run

TypeError:
Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.

1
2
python -c 'import tensorflow as tf; print(tf.__version__)'
1.0.0

Method:
tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, train_labels, embed,num_sampled, vocabulary_size))

Reference:
https://github.com/nlintz/TensorFlow-Tutorials/issues/80

Assignment 6: LSTMs

Assignment 6: LSTMs

Train a Long Short-Term Memory network to predict character sequences

Note: The assignments in this course build on each other, so please finish them in order.

Starter Code

Open the iPython notebook for this assignment (6_lstm.ipynb), and follow the instructions to implement and run each indicated step. A basic LSTM model has been provided; improve it by solving the given problems.

Evaluation

This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).

What changes did you make to use bigrams as input instead of individual characters? Were you able to implement the sequence-to-sequence LSTM? If so, what additional challenges did you have to solve?

Run

AttributeError:
'module' object has no attribute 'concat_v2'

1
2
# Classifier.
logits = tf.nn.xw_plus_b(tf.concat_v2(outputs, 0), w, b)
1
2
3
4
5
# Classifier.
logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits, tf.concat_v2(train_labels, 0)))

ValueError:
Only call softmax_cross_entropy_with_logits with named arguments (labels=…, logits=…, …)

1
2
3
4
5
# Classifier.
logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits, tf.concat(train_labels, 0)))

Method

1
2
# Classifier.
logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
1
2
3
4
5
# Classifier.
logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits, tf.concat(train_labels, 0)))
1
2
3
4
5
# Classifier.
logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=tf.concat(train_labels, 0)))

Have a try

pip uninstall tensorflow

pip install --ignore-installed --upgrade https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb

sudo pip install --index https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb

sudo pip install --upgrade https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb

sudo pip install --index https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb

PROJECT

(new) Deep Learning

MACHINE LEARNING TO DEEP LEARNING

Deep Learning

Deep Learning

Up to this point you’ve been introduced to a number of different learning schemes that take place in machine learning. You’ve seen supervised learning, where we try to extrapolate labels for new data given labelled data we already have. You’ve seen unsupervised learning, where we try to classify data into groups and extract new information hidden in the data. Lastly, you’ve seen reinforcement learning, where we try to create a model that learns the rules of an environment to best maximize its return or reward.

In this lesson, you’ll learn about a relatively new branch of machine learning called deep learning, which attempts to model high-level abstractions about data using networks of graphs. Deep learning, much like the other branches of machine learning you’ve seen, is similarly focused on learning representations in data. Additionally, modeling high-level abstractions about data is very similar to artificial intelligence — the idea that knowledge can be represented and acted upon intelligently.

What You’ll Watch and Learn

For this lesson, you’ll want to learn about algorithms that help you to construct the deep network graphs necessary to model high-level abstractions about data. In addition, you’ll also want to learn how to construct deep models that can interpret and identify words and letters in text — just like how a human reads! To do that, you’ll work on Udacity’s Deep Learning course, co-authored by Google. Vincent Vanhoucke, Principle Scientist at Google Brain, will be your instructor for this lesson. With Vincent as your guide, you’ll learn the ins and outs of Deep Learning and TensorFlow, which is Google’s Deep Learning framework.

Deep Learning What You’ll Do

In this lesson, you’ll learn how you can develop algorithms that are suitable to model high-level abstractions of data and create a type of “intelligence” that is able to use this abstraction for processing new information. First, you’ll learn about deep neural networks — artificial neural networks that have multiple hidden layers of information between its input and output. Next, you’ll learn about convolutional neural networks — a different flavor of neural networks that are modeled after biological processes like visual and aural feedback. Finally, you’ll learn about deep models for sequence learning — models that can “understand” written and spoken language and text.

The underlying lesson from these concepts is that, with enough data and time to learn, we can develop intelligent agents that think and act in many of the same ways we as humans do. Being able to model complex human behaviors and tasks like driving a car, processing spoken language, or even building a winning strategy for the game of Go, is a task that could not be done without use of deep learning.

Software and Tools

TensorFlow

TensorFlow

We will be using TensorFlow™, an open-source library developed by Google, to build deep learning models throughout the course. Coding will be in Python 2.7 using iPython notebooks, which you should be familiar with.

Download and Setup

Method 1: Pre-built Docker container with TensorFlow and all assignments
To get started with TensorFlow quickly and work on your assignments, follow the instructions in this README.

Note: If you are on a Windows machine, Method 1 is your only option due to lack of native TensorFlow support.

– OR –

Method 2: Install TensorFlow on your computer (Linux or Mac OS X only), then fetch assignment code separately
Follow the instructions to download and setup TensorFlow. Choose one of the three ways to install:

Pip: Install TensorFlow directly on your computer. You need to have Python 2.7 and pip installed; and this may impact other Python packages that you may have.
Virtualenv: Install TensorFlow in an isolated (virtual) Python environment. You need to have Python 2.7 and virtualenv installed; this will not affect Python packages in any other environment.
Docker: Run TensorFlow in an isolated Docker container (virtual machine) on your computer. You need to have Vagrant, Docker and virtualization software like VirtualBox installed; this will keep TensorFlow completely isolated from the rest of your computer, but may require more memory to run.
Links: Tutorials, How-Tos, Resources, Source code, Stack Overflow

INTRO TO TENSORFLOW

Intro to TensorFlow

What is Deep Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/INt1nULYPak.mp4


Solving Problems - Big and Small

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/WHcRQMGSbqg.mp4


Let’s Get Started

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ySIDqaXLhHw.mp4


Installing TensorFlow


Throughout this lesson, you’ll apply your knowledge of neural networks on real datasets using TensorFlow (link for China), an open source Deep Learning library created by Google.

You’ll use TensorFlow to classify images from the notMNIST dataset - a dataset of images of English letters from A to J. You can see a few example images below.

Your goal is to automatically detect the letter based on the image in the dataset. You’ll be working on your own computer for this lab, so, first things first, install TensorFlow!

Install

As usual, we’ll be using Conda to install TensorFlow. You might already have a TensorFlow environment, but check to make sure you have all the necessary packages.

OS X or Linux

Run the following commands to setup your environment:

conda create -n tensorflow python=3.5
source activate tensorflow
conda install pandas matplotlib jupyter notebook scipy scikit-learn
pip install tensorflow
Windows

And installing on Windows. In your console or Anaconda shell,

conda create -n tensorflow python=3.5
activate tensorflow
conda install pandas matplotlib jupyter notebook scipy scikit-learn
pip install tensorflow
Hello, world!

Try running the following code in your Python console to make sure you have TensorFlow properly installed. The console will print “Hello, world!” if TensorFlow is installed. Don’t worry about understanding what it does. You’ll learn about it in the next section.

import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)
Try

open cmd with admin

conda create -n tensorflow python=3.5

C:\windows\system32>conda create -n tensorflow python=3.5
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment C:\Program Files\Anaconda2\envs\ten
sorflow:

The following NEW packages will be INSTALLED:

    pip:            9.0.1-py35_1
    python:         3.5.3-0
    setuptools:     27.2.0-py35_1
    vs2015_runtime: 14.0.25123-0
    wheel:          0.29.0-py35_0

Proceed ([y]/n)? y

vs2015_runtime 100% |###############################| Time: 0:00:02 776.58 kB/s
python-3.5.3-0 100% |###############################| Time: 0:01:29 361.95 kB/s
setuptools-27. 100% |###############################| Time: 0:00:00   1.09 MB/s
wheel-0.29.0-p 100% |###############################| Time: 0:00:00   1.55 MB/s
pip-9.0.1-py35 100% |###############################| Time: 0:00:01 997.36 kB/s
#
# To activate this environment, use:
# > activate tensorflow
#
# To deactivate this environment, use:
# > deactivate tensorflow
#
# * for power-users using bash, you must source
#

activate tensorflow

(tensorflow) C:\windows\system32>

conda install pandas matplotlib jupyter notebook scipy scikit-learn
Fetching package metadata ………..
Solving package specifications: .

Package plan for installation in environment C:\Program Files\Anaconda2\envs\ten
sorflow:

The following NEW packages will be INSTALLED:

bleach:              1.5.0-py35_0
colorama:            0.3.7-py35_0
cycler:              0.10.0-py35_0
decorator:           4.0.11-py35_0
entrypoints:         0.2.2-py35_1
html5lib:            0.999-py35_0
icu:                 57.1-vc14_0        [vc14]
ipykernel:           4.5.2-py35_0
ipython:             5.3.0-py35_0
ipython_genutils:    0.1.0-py35_0
ipywidgets:          6.0.0-py35_0
jinja2:              2.9.5-py35_0
jpeg:                9b-vc14_0          [vc14]
jsonschema:          2.5.1-py35_0
jupyter:             1.0.0-py35_3
jupyter_client:      5.0.0-py35_0
jupyter_console:     5.1.0-py35_0
jupyter_core:        4.3.0-py35_0
libpng:              1.6.27-vc14_0      [vc14]
markupsafe:          0.23-py35_2
matplotlib:          2.0.0-np112py35_0
mistune:             0.7.4-py35_0
mkl:                 2017.0.1-0
nbconvert:           5.1.1-py35_0
nbformat:            4.3.0-py35_0
notebook:            4.4.1-py35_0
numpy:               1.12.0-py35_0
openssl:             1.0.2k-vc14_0      [vc14]
pandas:              0.19.2-np112py35_1
pandocfilters:       1.4.1-py35_0
path.py:             10.1-py35_0
pickleshare:         0.7.4-py35_0
prompt_toolkit:      1.0.13-py35_0
pygments:            2.2.0-py35_0
pyparsing:           2.1.4-py35_0
pyqt:                5.6.0-py35_2
python-dateutil:     2.6.0-py35_0
pytz:                2016.10-py35_0
pyzmq:               16.0.2-py35_0
qt:                  5.6.2-vc14_3       [vc14]
qtconsole:           4.2.1-py35_2
scikit-learn:        0.18.1-np112py35_1
scipy:               0.19.0-np112py35_0
simplegeneric:       0.8.1-py35_1
sip:                 4.18-py35_0
six:                 1.10.0-py35_0
testpath:            0.3-py35_0
tk:                  8.5.18-vc14_0      [vc14]
tornado:             4.4.2-py35_0
traitlets:           4.3.2-py35_0
wcwidth:             0.1.7-py35_0
widgetsnbextension:  2.0.0-py35_0
win_unicode_console: 0.5-py35_0
zlib:                1.2.8-vc14_3       [vc14]

Proceed ([y]/n)? y

mkl-2017.0.1-0 100% |###############################| Time: 0:04:46 470.85 kB/s
icu-57.1-vc14_ 100% |###############################| Time: 0:01:28 403.91 kB/s
jpeg-9b-vc14_0 100% |###############################| Time: 0:00:00 379.04 kB/s
openssl-1.0.2k 100% |###############################| Time: 0:00:13 393.72 kB/s
tk-8.5.18-vc14 100% |###############################| Time: 0:00:04 473.45 kB/s
zlib-1.2.8-vc1 100% |###############################| Time: 0:00:00 503.24 kB/s
colorama-0.3.7 100% |###############################| Time: 0:00:00 622.07 kB/s
decorator-4.0. 100% |###############################| Time: 0:00:00 690.00 kB/s
entrypoints-0. 100% |###############################| Time: 0:00:00 625.06 kB/s
ipython_genuti 100% |###############################| Time: 0:00:00 597.35 kB/s
jsonschema-2.5 100% |###############################| Time: 0:00:00 503.91 kB/s
libpng-1.6.27- 100% |###############################| Time: 0:00:01 432.48 kB/s
markupsafe-0.2 100% |###############################| Time: 0:00:00 520.82 kB/s
mistune-0.7.4- 100% |###############################| Time: 0:00:00 441.53 kB/s
numpy-1.12.0-p 100% |###############################| Time: 0:00:10 354.48 kB/s
pandocfilters- 100% |###############################| Time: 0:00:00 363.00 kB/s
path.py-10.1-p 100% |###############################| Time: 0:00:00 293.57 kB/s
pygments-2.2.0 100% |###############################| Time: 0:00:04 302.43 kB/s
pyparsing-2.1. 100% |###############################| Time: 0:00:00 270.85 kB/s
pytz-2016.10-p 100% |###############################| Time: 0:00:00 233.38 kB/s
pyzmq-16.0.2-p 100% |###############################| Time: 0:00:02 266.24 kB/s
simplegeneric- 100% |###############################| Time: 0:00:00 373.89 kB/s
sip-4.18-py35_ 100% |###############################| Time: 0:00:00 268.95 kB/s
six-1.10.0-py3 100% |###############################| Time: 0:00:00 409.00 kB/s
testpath-0.3-p 100% |###############################| Time: 0:00:00 329.72 kB/s
tornado-4.4.2- 100% |###############################| Time: 0:00:02 253.88 kB/s
wcwidth-0.1.7- 100% |###############################| Time: 0:00:00 329.53 kB/s
win_unicode_co 100% |###############################| Time: 0:00:00 302.28 kB/s
cycler-0.10.0- 100% |###############################| Time: 0:00:00 393.21 kB/s
html5lib-0.999 100% |###############################| Time: 0:00:00 260.77 kB/s
jinja2-2.9.5-p 100% |###############################| Time: 0:00:01 250.23 kB/s
pickleshare-0. 100% |###############################| Time: 0:00:00 326.15 kB/s
prompt_toolkit 100% |###############################| Time: 0:00:01 281.79 kB/s
python-dateuti 100% |###############################| Time: 0:00:00 280.81 kB/s
qt-5.6.2-vc14_ 100% |###############################| Time: 0:02:03 469.10 kB/s
scipy-0.19.0-n 100% |###############################| Time: 0:00:20 656.15 kB/s
traitlets-4.3. 100% |###############################| Time: 0:00:00 418.63 kB/s
bleach-1.5.0-p 100% |###############################| Time: 0:00:00 508.29 kB/s
ipython-5.3.0- 100% |###############################| Time: 0:00:02 406.32 kB/s
jupyter_core-4 100% |###############################| Time: 0:00:00 365.87 kB/s
pandas-0.19.2- 100% |###############################| Time: 0:00:13 548.51 kB/s
pyqt-5.6.0-py3 100% |###############################| Time: 0:00:08 586.14 kB/s
scikit-learn-0 100% |###############################| Time: 0:00:16 282.73 kB/s
jupyter_client 100% |###############################| Time: 0:00:00 250.90 kB/s
matplotlib-2.0 100% |###############################| Time: 0:00:17 508.36 kB/s
nbformat-4.3.0 100% |###############################| Time: 0:00:00   1.41 MB/s
ipykernel-4.5. 100% |###############################| Time: 0:00:00   1.39 MB/s
nbconvert-5.1. 100% |###############################| Time: 0:00:00   1.42 MB/s
jupyter_consol 100% |###############################| Time: 0:00:00 397.64 kB/s
notebook-4.4.1 100% |###############################| Time: 0:00:06 890.12 kB/s
qtconsole-4.2. 100% |###############################| Time: 0:00:00 705.98 kB/s
widgetsnbexten 100% |###############################| Time: 0:00:01 727.40 kB/s
ipywidgets-6.0 100% |###############################| Time: 0:00:00 632.13 kB/s
jupyter-1.0.0- 100% |###############################| Time: 0:00:00 665.76 kB/s
ERROR conda.core.link:_execute_actions(330): An error occurred while installing
package 'defaults::qt-5.6.2-vc14_3'.
UnicodeDecodeError('utf8', '\xd2\xd1\xb8\xb4\xd6\xc6         1 \xb8\xf6\xce\xc4\
xbc\xfe\xa1\xa3\r\n', 0, 1, 'invalid continuation byte')
Attempting to roll back.



UnicodeDecodeError('utf8', '\xd2\xd1\xb8\xb4\xd6\xc6         1 \xb8\xf6\xce\xc4\
xbc\xfe\xa1\xa3\r\n', 0, 1, 'invalid continuation byte')

(tensorflow) C:\windows\system32>pip install tensorflow

Hello, Tensor World!

Hello, Tensor World!

Let’s analyze the Hello World script you ran. For reference, I’ve added the code below.

import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)
Tensor

In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated(封装) in an object called a tensor. In the case of hello_constant = tf.constant('Hello World!'), hello_constant is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:

# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
 # C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

tf.constant() is one of many TensorFlow operations you will use in this lesson. The tensor returned by tf.constant() is called a constant tensor, because the value of the tensor never changes.

Session

TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a mathematical process which you learned about in the MiniFlow lesson. Let’s take the TensorFlow code you ran and turn that into a graph:

A “TensorFlow Session”, as shown above, is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines. Let’s see how you use it.

with tf.Session() as sess:
    output = sess.run(hello_constant)

The code has already created the tensor, hello_constant, from the previous lines. The next step is to evaluate the tensor in a session.

The code creates a session instance, sess, using tf.Session. The sess.run() function then evaluates the tensor and returns the results.

Quiz: TensorFlow Input

Input

In the last section, you passed a tensor into a session and it returned the result. What if you want to use a non-constant? This is where tf.placeholder() and feed_dict come into place. In this section, you’ll go over the basics of feeding data into TensorFlow.

tf.placeholder()

Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time you’ll want your TensorFlow model to take in different datasets with different parameters. You need tf.placeholder()!

tf.placeholder() returns a tensor that gets its value from data passed to the tf.session.run() function, allowing you to set the input right before the session runs.

Session’s feed_dict
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})

Use the feed_dict parameter in tf.session.run() to set the placeholder tensor. The above example shows the tensor x being set to the string "Hello, world". It’s also possible to set more than one tensor using feed_dict as shown below.

x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})

Note: If the data passed to the feed_dict doesn’t match the tensor type and can’t be cast into the tensor type, you’ll get the error “ValueError: invalid literal for...”.

Quiz

Let’s see how well you understand tf.placeholder() and feed_dict. The code below throws an error, but I want you to make it return the number 123. Change line 11, so that the code returns the number 123.

Note: The quizzes are running TensorFlow version 0.12.1. However, all the code used in this course is compatible with version 1.0. We’ll be upgrading our in class quizzes to the newest version in the near future.

# Solution is available in the other "solution.py" tab
import tensorflow as tf


def run():
    output = None
    x = tf.placeholder(tf.int32)

    with tf.Session() as sess:
        # TODO: Feed the x tensor 123
        output = sess.run(x,feed_dict={x:123})

    return output

Quiz: TensorFlow Math

TensorFlow Math

Getting the input is great, but now you need to use it. You’re going to use basic math functions that everyone knows and loves - add, subtract, multiply, and divide - with tensors. (There’s many more math functions you can check out in the documentation.)

Addition
x = tf.add(5, 2)  # 7

You’ll start with the add function. The tf.add() function does exactly what you expect it to do. It takes in two numbers, two tensors, or one of each, and returns their sum as a tensor.

Subtraction and Multiplication

Here’s an example with subtraction and multiplication.

x = tf.subtract(10, 4) # 6
y = tf.multiply(2, 5)  # 10

The x tensor will evaluate to 6, because 10 - 4 = 6. The y tensor will evaluate to 10, because 2 * 5 = 10. That was easy!

Converting types

It may be necessary to convert between types to make certain operators work together. For example, if you tried the following, it would fail with an exception:

tf.subtract(tf.constant(2.0),tf.constant(1))  # Fails with ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int32:

That’s because the constant 1 is an integer but the constant 2.0 is a floating point value and subtract expects them to match.

In cases like these, you can either make sure your data is all of the same type, or you can cast a value to another type. In this case, converting the 2.0 to an integer before subtracting, like so, will give the correct result:

tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1))   # 1
Quiz

Let’s apply what you learned to convert an algorithm to TensorFlow. The code below is a simple algorithm using division and subtraction. Convert the following algorithm in regular Python to TensorFlow and print the results of the session. You can use tf.constant() for the values 10, 2, and 1.

# Solution is available in the other "solution.py" tab
import tensorflow as tf

# TODO: Convert the following to TensorFlow:
x = 10
y = 2
z = x/y - 1
x=tf.constant(x)
y=tf.constant(y)
z=tf.constant(z)
#z=tf.subtract(tf.divide(x,y),tf.cast(tf.constant(1),tf.float64))
# TODO: Print z from a session
with tf.Session() as sess:
    output = sess.run(z)
    print(output)

Transition to Classification


Good job! You’ve accomplished a lot. In particular, you did the following:

  • Ran operations in tf.Session.
  • Created a constant tensor with tf.constant().
  • Used tf.placeholder() and feed_dict to get input.
  • Applied the tf.add(), tf.subtract(), tf.multiply(), and tf.divide() functions using numeric data.
  • Learned about casting between types with tf.cast()
    You know the basics of TensorFlow, so let’s take a break and get back to the theory of neural networks. In the next few videos, you’re going to learn about one of the most popular applications of neural networks - classification.

Supervised Classification

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XTGsutypAPE.mp4


Training Your Logistic Classifier

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/WQsdr1EJgz8.mp4


Quiz: TensorFlow Linear Function

Linear functions in TensorFlow

The most common operation in neural networks is calculating the linear combination of inputs, weights, and biases. As a reminder, we can write the output of the linear operation as

Here, W is a matrix of the weights connecting two layers. The output y, the input x, and the biases b are all vectors.

Weights and Bias in TensorFlow

The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you’ll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can’t be modified. This is where tf.Variable class comes in.

tf.Variable()
x = tf.Variable(5)

The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You’ll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.

Initialization
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable class allows us to change the weights and bias, but an initial value needs to be chosen.

Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. You’ll learn more about this in the next lesson, when you study gradient descent.

Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You’ll use the tf.truncated_normal() function to generate random numbers from a normal distribution.

tf.truncated_normal()
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

The tf.truncated_normal() function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.

Since the weights are already helping prevent the model from getting stuck, you don’t need to randomize the bias. Let’s use the simplest solution, setting the bias to 0.

tf.zeros()
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

The tf.zeros() function returns a tensor with all zeros.

Linear Classifier Quiz

A subset of the MNIST dataset
You’ll be classifying the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. The above is a small sample of the data you’ll be training on. Notice how some of the 1s are written with a serif at the top and at different angles. The similarities and differences will play a part in shaping the weights of the model.
Left: Weights for labeling 0. Middle: Weights for labeling 1. Right: Weights for labeling 2.

The images above are trained weights for each label (0, 1, and 2). The weights display the unique properties of each digit they have found. Complete this quiz to train your own weights using the MNIST dataset.

Instructions
  1. Open quiz.py.
    1. Implement get_weights to return a tf.Variable of weights
    2. Implement get_biases to return a tf.Variable of biases
    3. Implement xW + b in the linear function
  2. Open sandbox.py
    1. Initialize all weights
      Since xW in xW + b is matrix multiplication, you have to use the tf.matmul() function instead of tf.multiply(). Don’t forget that order matters in matrix multiplication, so tf.matmul(a,b) is not the same as tf.matmul(b,a).
quiz.py
# Solution is available in the other "quiz_solution.py" tab
import tensorflow as tf

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    # TODO: Return weights
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    # TODO: Return biases
    return tf.Variable(tf.zeros(n_labels))


def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # TODO: Linear Function (xW + b)
    return tf.add(tf.matmul(input,w),b)
sandbox.py
# Solution is available in the other "sandbox_solution.py" tab
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from quiz import get_weights, get_biases, linear


def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

with tf.Session() as session:
    # TODO: Initialize session variables
    session.run(tf.global_variables_initializer())
    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

Quiz: TensorFlow Softmax

TensorFlow Softmax

You might remember in the Intro to TFLearn lesson we used the softmax function to calculate class probabilities as output from the network. The softmax function squashes it’s inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It’s the perfect function to use as the output activation for a network predicting multiple classes.
Example of the softmax function at work.

TensorFlow Softmax

We’re using TensorFlow to build neural networks and, appropriately, there’s a function for calculating softmax.

x = tf.nn.softmax([2.0, 1.0, 0.2])

Easy as that! tf.nn.softmax() implements the softmax function for you. It takes in logits and returns softmax activations.

Quiz

Use the softmax function in the quiz below to return the softmax of the logits.

quiz.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf


def run():
    output = None
    logit_data = [2.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)

    # TODO: Calculate the softmax of the logits
    # softmax =     
    softmax = tf.nn.softmax([2.0, 1.0, 0.1])
    with tf.Session() as sess:
        # TODO: Feed in the logit data
        # output = sess.run(softmax,    )
        output = sess.run(softmax,feed_dict={logits:logit_data}    )
    return output

One-Hot Encoding

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/phYsxqlilUk.mp4

13 L One Hot Encoding

One-Hot Encoding With Scikit-Learn
Transforming your labels into one-hot encoded vectors is pretty simple with scikit-learn using LabelBinarizer. Check it out below!

import numpy as np
from sklearn import preprocessing

# Example labels
labels = np.array([1,5,3,2,1,4,2,1,3])

# Create the encoder
lb = preprocessing.LabelBinarizer()

# Here the encoder finds the classes and assigns one-hot vectors 
lb.fit(labels)

# And finally, transform the labels into one-hot encoded vectors
lb.transform(labels)
>>> array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 1],
           [0, 0, 1, 0, 0],
           [0, 1, 0, 0, 0],
           [1, 0, 0, 0, 0],
           [0, 0, 0, 1, 0],
           [0, 1, 0, 0, 0],
           [1, 0, 0, 0, 0],
           [0, 0, 1, 0, 0]])

Quiz: TensorFlow Cross Entropy

Cross Entropy in TensorFlow

In the Intro to TFLearn lesson we discussed using cross entropy as the cost function for classification with one-hot encoded labels. Again, TensorFlow has a function to do the cross entropy calculations for us.
Cross entropy loss function

Let’s take what you learned from the video and create a cross entropy function in TensorFlow. To create a cross entropy function in TensorFlow, you’ll need to use two new functions:

Reduce Sum
x = tf.reduce_sum([1, 2, 3, 4, 5])  # 15

The tf.reduce_sum() function takes an array of numbers and sums them together.

Natural Log
x = tf.log(100)  # 4.60517

This function does exactly what you would expect it to do. tf.log() takes the natural log of a number.

Quiz

Print the cross entropy using softmax_data and one_hot_encod_label.
(Alternative link for users in China.)

quiz.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf

softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

# TODO: Print cross entropy from session
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

with tf.Session() as sess:
    print(sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data}))

    0.356675

Minimizing Cross Entropy

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YrDMXFhvh9E.mp4

Transition into Practical Aspects of Learning

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bKqkRFOOKoA.mp4

Quiz: Numerical Stability

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_SbGcOS-jcQ.mp4

a = 1000000000
for i in range(1000000):
    a = a + 1e-6
print(a - 1000000000)
0.953674316406

Normalized Inputs and Initial Weights

Measuring Performance

Optimizing a Logistic Classifier

Stochastic Gradient Descent

Momentum and Learning Rate Decay

Parameter Hyperspace

Quiz: Mini-batch

Mini-batching

In this section, you’ll go over what mini-batching is and how to apply it in TensorFlow.

Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can’t calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It’s also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you’re performing SGD with each batch.

Let’s look at the MNIST dataset with weights and a bias to see if your machine can handle it.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
Question 1

Calculate the memory size of train_features, train_labels, weights, and bias in bytes. Ignore memory for overhead, just calculate the memory required for the stored data.

You may have to look up how much memory a float32 requires, using this link.

train_features Shape: (55000, 784) Type: float32

train_labels Shape: (55000, 10) Type: float32

weights Shape: (784, 10) Type: float32

bias Shape: (10,) Type: float32

How many bytes of memory does train_features need?
550007844=172480000

How many bytes of memory does train_labels need?
2200000

How many bytes of memory does weights need?
31360

How many bytes of memory does bias need?
40

The total memory space required for the inputs, weights and bias is around 174 megabytes, which isn’t that much memory. You could train this whole dataset on most CPUs and GPUs.

But larger datasets that you’ll use in the future measured in gigabytes or more. It’s possible to purchase more memory, but it’s expensive. A Titan X GPU with 12 GB of memory costs over $1,000.

Instead, in order to run large models on your machine, you’ll learn how to use mini-batching.

Let’s look at how you implement mini-batching in TensorFlow.

TensorFlow Mini-batching

In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it’s sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you’d like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you’d wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7128 + 1104 = 1000)

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow’s tf.placeholder() function to receive the varying batch sizes.

Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

What does None do here?

The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.

Question 2

Use the parameters below, how many batches are there, and what is the last batch size?

features is (50000, 400)

labels is (50000, 10)

batch_size is 128

How many batches are there?
50000/128+1=391

What is the last batch size?
50000%128=80

Now that you know the basics, let’s learn how to implement mini-batching.

Question 3

Implement the batches function to batch features and labels. The function should return each batch with a maximum size of batch_size. To help you with the quiz, look at the following example output of a working batches function.

# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

example_batches = batches(3, example_features, example_labels)

The example_batches variable would be the following:

[
    # 2 batches:
    #   First is a batch of size 3.
    #   Second is a batch of size 1
    [
        # First Batch is size 3
        [
            # 3 samples of features.
            # There are 4 features per sample.
            ['F11', 'F12', 'F13', 'F14'],
            ['F21', 'F22', 'F23', 'F24'],
            ['F31', 'F32', 'F33', 'F34']
        ], [
            # 3 samples of labels.
            # There are 2 labels per sample.
            ['L11', 'L12'],
            ['L21', 'L22'],
            ['L31', 'L32']
        ]
    ], [
        # Second Batch is size 1.
        # Since batch size is 3, there is only one sample left from the 4 samples.
        [
            # 1 sample of features.
            ['F41', 'F42', 'F43', 'F44']
        ], [
            # 1 sample of labels.
            ['L41', 'L42']
        ]
    ]
]

Implement the batches function in the “quiz.py” file below.

“quiz.py”
import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    output_batches = []

    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        output_batches.append(batch)

    return output_batches
“sandbox.py”
from quiz import batches
from pprint import pprint

# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

# PPrint prints data structures like 2d arrays, so they are easier to read
pprint(batches(3, example_features, example_labels))

Let’s use mini-batching to feed batches of MNIST features and labels into a linear model.

Set the batch size and run the optimizer over all the batches with the batches function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.

“quiz.py”
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    # TODO: Train optimizer on all batches
    # for batch_features, batch_labels in ______
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

The accuracy is low, but you probably know that you could train on the dataset more than once. You can train a model using the dataset multiple times. You’ll go over this subject in the next section where we talk about “epochs”.

Epochs

Epochs

An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

The following TensorFlow code trains a model using 10 epochs.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches  # Helper function created in Mini-batching section


def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Running the code will output the following:

Epoch: 0    - Cost: 11.0     Valid Accuracy: 0.204
Epoch: 1    - Cost: 9.95     Valid Accuracy: 0.229
Epoch: 2    - Cost: 9.18     Valid Accuracy: 0.246
Epoch: 3    - Cost: 8.59     Valid Accuracy: 0.264
Epoch: 4    - Cost: 8.13     Valid Accuracy: 0.283
Epoch: 5    - Cost: 7.77     Valid Accuracy: 0.301
Epoch: 6    - Cost: 7.47     Valid Accuracy: 0.316
Epoch: 7    - Cost: 7.2      Valid Accuracy: 0.328
Epoch: 8    - Cost: 6.96     Valid Accuracy: 0.342
Epoch: 9    - Cost: 6.73     Valid Accuracy: 0.36 
Test Accuracy: 0.3801000118255615

Each epoch attempts to move to a lower cost, leading to better accuracy.

This model continues to improve accuracy up to Epoch 9. Let’s increase the number of epochs to 100.

...
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.86
Epoch: 80   - Cost: 0.11     Valid Accuracy: 0.869
Epoch: 81   - Cost: 0.109    Valid Accuracy: 0.869
....
Epoch: 85   - Cost: 0.107    Valid Accuracy: 0.869
Epoch: 86   - Cost: 0.107    Valid Accuracy: 0.869
Epoch: 87   - Cost: 0.106    Valid Accuracy: 0.869
Epoch: 88   - Cost: 0.106    Valid Accuracy: 0.869
Epoch: 89   - Cost: 0.105    Valid Accuracy: 0.869
Epoch: 90   - Cost: 0.105    Valid Accuracy: 0.869
Epoch: 91   - Cost: 0.104    Valid Accuracy: 0.869
Epoch: 92   - Cost: 0.103    Valid Accuracy: 0.869
Epoch: 93   - Cost: 0.103    Valid Accuracy: 0.869
Epoch: 94   - Cost: 0.102    Valid Accuracy: 0.869
Epoch: 95   - Cost: 0.102    Valid Accuracy: 0.869
Epoch: 96   - Cost: 0.101    Valid Accuracy: 0.869
Epoch: 97   - Cost: 0.101    Valid Accuracy: 0.869
Epoch: 98   - Cost: 0.1      Valid Accuracy: 0.869
Epoch: 99   - Cost: 0.1      Valid Accuracy: 0.869
Test Accuracy: 0.8696000006198883

From looking at the output above, you can see the model doesn’t increase the validation accuracy after epoch 80. Let’s see what happens when we increase the learning rate.

learn_rate = 0.1

Epoch: 76   - Cost: 0.214    Valid Accuracy: 0.752
Epoch: 77   - Cost: 0.21     Valid Accuracy: 0.756
Epoch: 78   - Cost: 0.21     Valid Accuracy: 0.756
...
Epoch: 85   - Cost: 0.207    Valid Accuracy: 0.756
Epoch: 86   - Cost: 0.209    Valid Accuracy: 0.756
Epoch: 87   - Cost: 0.205    Valid Accuracy: 0.756
Epoch: 88   - Cost: 0.208    Valid Accuracy: 0.756
Epoch: 89   - Cost: 0.205    Valid Accuracy: 0.756
Epoch: 90   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 91   - Cost: 0.207    Valid Accuracy: 0.756
Epoch: 92   - Cost: 0.204    Valid Accuracy: 0.756
Epoch: 93   - Cost: 0.206    Valid Accuracy: 0.756
Epoch: 94   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 95   - Cost: 0.2974   Valid Accuracy: 0.756
Epoch: 96   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 97   - Cost: 0.2996   Valid Accuracy: 0.756
Epoch: 98   - Cost: 0.203    Valid Accuracy: 0.756
Epoch: 99   - Cost: 0.2987   Valid Accuracy: 0.756
Test Accuracy: 0.7556000053882599

Looks like the learning rate was increased too much. The final accuracy was lower, and it stopped improving earlier. Let’s stick with the previous learning rate, but change the number of epochs to 80.

Epoch: 65   - Cost: 0.122    Valid Accuracy: 0.868
Epoch: 66   - Cost: 0.121    Valid Accuracy: 0.868
Epoch: 67   - Cost: 0.12     Valid Accuracy: 0.868
Epoch: 68   - Cost: 0.119    Valid Accuracy: 0.868
Epoch: 69   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 70   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 71   - Cost: 0.117    Valid Accuracy: 0.868
Epoch: 72   - Cost: 0.116    Valid Accuracy: 0.868
Epoch: 73   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 74   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 75   - Cost: 0.114    Valid Accuracy: 0.868
Epoch: 76   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 77   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 78   - Cost: 0.112    Valid Accuracy: 0.868
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.868
Epoch: 80   - Cost: 0.111    Valid Accuracy: 0.869
Test Accuracy: 0.86909999418258667

The accuracy only reached 0.86, but that could be because the learning rate was too high. Lowering the learning rate would require more epochs, but could ultimately achieve better accuracy.

In the upcoming TensorFLow Lab, you’ll get the opportunity to choose your own learning rate, epoch count, and batch size to improve the model’s accuracy.

More about epoch in Quora.

INTRO TO NEURAL NETWORKS

Intro to Neural Networks

Introducing Luis

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nto-stLuN6M.mp4

Logistic Regression Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kSs6O3R7JUI.mp4

Logistic Regression Answer

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1iNylA3fJDs.mp4

Neural Networks

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Mqogpnp1lrU.mp4

Perceptron

Perceptron

Now you’ve seen how a simple neural network makes decisions: by taking in input data, processing that information, and finally, producing an output in the form of a decision! Let’s take a deeper dive into the university admission example and learn more about how this input data is processed.

Data, like test scores and grades, is fed into a network of interconnected nodes. These individual nodes are called perceptrons or neurons, and they are the basic unit of a neural network. Each one looks at input data and decides how to categorize that data. In the example above, the input either passes a threshold for grades and test scores or doesn’t, and so the two categories are: yes (passed the threshold) and no (didn’t pass the threshold). These categories then combine to form a decision – for example, if both nodes produce a “yes” output, then this student gains admission into the university.

Let’s zoom in even further and look at how a single perceptron processes input data.

The perceptron above is one of the two perceptrons from the video that help determine whether or not a student is accepted to a university. It decides whether a student’s grades are high enough to be accepted to the university. You might be wondering: “How does it know whether grades or test scores are more important in making this acceptance decision?” Well, when we initialize a neural network, we don’t know what information will be most important in making a decision. It’s up to the neural network to learn for itself which data is most important and adjust how it considers that data.

It does this with something called weights.

Weights

When input data comes into a perceptron, it gets multiplied by a weight value that is assigned to this particular input. For example, the perceptron above have two inputs, tests for test scores and grades, so it has two associated weights that can be adjusted individually. These weights start out as random values, and as the neural network learns more about what kind of input data leads to a student being accepted into a university, the network adjusts the weights based on any errors in categorization that the previous weights resulted in. This is called training the neural network.

A higher weight means the neural network considers that input more important than other inputs, and lower weight means that the data is considered less important. An extreme example would be if test scores had no affect at all on university acceptance; then the weight of the test score input data would be zero and it would have no affect on the output of the perceptron.

Summing the Input Data

So, each input to a perceptron has an associated weight that represents its importance and these weights are determined during the learning process of a neural network, called training. In the next step, the weighted input data is summed up to produce a single value, that will help determine the final output - whether a student is accepted to a university or not. Let’s see a concrete example of this.
We weight x_test by w_test and add it to x_grades weighted by w_grades.
When writing equations related to neural networks, the weights will always be represented by some type of the letter w. It will usually look like a W when it represents a matrix of weights or a w when it represents an individual weight, and it may include some additional information in the form of a subscript to specify which weights (you’ll see more on that next). But remember, when you see the letter w, think weights.

In this example, we’ll use $w{​grades}$ for the weight of grades and $w{​test}$ for the weight of test. For the image above, let’s say that the weights are: $w{​grades}=-1$,$w{​test}=−0.2$. You don’t have to be concerned with the actual values, but their relative values are important. $w{​grades}$ is 5 times larger than ​​$w{​test}$, which means the neural network considers grades input 5 times more important than test in determining whether a student will be accepted into a university.

The perceptron applies these weights to the inputs and sums them in a process known as linear combination. In our case, this looks like $$w{​grades}x{​grades}+w{​test}x{​test}=-1x{​grades}-0.2x{​test}$$​.

Now, to make our equation less wordy, let’s replace the explicit names with numbers. Let’s use 1 for grades and 2 for tests. So now our equation becomes $$w{1}*x{1}+w{1}*x{1}$$​.

In this example, we just have 2 simple inputs: grades and tests. Let’s imagine we instead had m different inputs and we labeled them $x{1},x{2}…x{m}$. Let’s also say that the weight corresponding to $x{1}$​ is $w_{1}$ and so on. In that case, we would express the linear combination succintly as:

$$\Sigma1^mw{i}*x{i}$$
Here, the Greek letter Sigma $\Sigma$ is used to represent summation. It simply means to evaluate the equation to the right multiple times and add up the results. In this case, the equation it will sum is $w
{i}*x_{i}$

But where do we get $w{i}$ and $x{i}$ ?

$\Sigma_1^m$ means to iterate over all i values, from 1 to m.

So to put it all together, $\Sigma1^mw{i}*x_{i}$ means the following:

  • Start at i=1
  • Evaluate $w{1}*x{1}$ and remember the results
  • Move to i=2
  • Evaluate $w{2}*x{2}$ and add these results to $w{1}*x{1}$
  • Continue repeating that process until i=m, where m is the number of inputs.

One last thing: you’ll see equations written many different ways, both here and when reading on your own. For example, you will often just see $\Sigmai$ instead of $\Sigma{i=1}^m$. The first is simply a shorter way of writing the second. That is, if you see a summation without a starting number or a defined end value, it just means perform the sum for all of the them. And sometimes, if the value to iterate over can be inferred, you’ll see it as just $\Sigma$. Just remember they’re all the same thing: $\Sigma{i=1}^m w{i}*x_{i} = \Sigmai w{i}*x{i} = \Sigma w{i}*x_{i}$.

Calculating the Output with an Activation Function

Finally, the result of the perceptron’s summation is turned into an output signal! This is done by feeding the linear combination into an activation function.

Activation functions are functions that decide, given the inputs into the node, what should be the node’s output? Because it’s the activation function that decides the actual output, we often refer to the outputs of a layer as its “activations”.

One of the simplest activation functions is the Heaviside step function. This function returns a 0 if the linear combination is less than 0. It returns a 1 if the linear combination is positive or equal to zero. The Heaviside step function is shown below, where h is the calculated linear combination:


In the university acceptance example above, we used the weights $w{grades} = -1$, $w{​test} = −0.2$. Since​​ $w{grades}$ and $w{​test}$ are negative values, the activation function will only return a 1 if grades and test are 0! This is because the range of values from the linear combination using these weights and inputs are (−∞,0] (i.e. negative infinity to 0, including 0 itself).

It’s easiest to see this with an example in two dimensions. In the following graph, imagine any points along the line or in the shaded area represent all the possible inputs to our node. Also imagine that the value along the y-axis is the result of performing the linear combination on these inputs and the appropriate weights. It’s this result that gets passed to the activation function.

Now remember that the step activation function returns 1 for any inputs greater than or equal to zero. As you can see in the image, only one point has a y-value greater than or equal to zero – the point right at the origin, (0,0):

Now, we certainly want more than one possible grade/test combination to result in acceptance, so we need to adjust the results passed to our activation function so it activates – that is, returns 1 – for more inputs. Specifically, we need to find a way so all the scores we’d like to consider acceptable for admissions produce values greater than or equal to zero when linearly combined with the weights into our node.

One way to get our function to return 1 for more inputs is to add a value to the results of our linear combination, called a bias.

A bias, represented in equations as b, lets us move values in one direction or another.

For example, the following diagram shows the previous hypothetical function with an added bias of +3. The blue shaded area shows all the values that now activate the function. But notice that these are produced with the same inputs as the values shown shaded in grey – just adjusted higher by adding the bias term:

Of course, with neural networks we won’t know in advance what values to pick for biases. That’s ok, because just like the weights, the bias can also be updated and changed by the neural network during training. So after adding a bias, we now have a complete perceptron formula:

This formula returns 1 if the input $x{1},x{2}…x_{m}$ belongs to the accepted-to-university category or returns 0 if it doesn’t. The input is made up of one or more real numbers, each one represented by $x_{i}$, where m is the number of inputs.

Then the neural network starts to learn! Initially, the weights $w_{i}$ and bias (b) are assigned a random value, and then they are updated using a learning algorithm like gradient descent. The weights and biases change so that the next training example is more accurately categorized, and patterns in data are “learned” by the neural network.

Now that you have a good understanding of perceptions, let’s put that knowledge to use. In the next section, you’ll create the AND perceptron from the Neural Networks video by setting the values for weights and bias.

AND Perceptron Quiz

What are the weights and bias for the AND perceptron?

Set the weights (weight1, weight2) and bias bias to the correct values that calculate AND operation as shown above.
In this case, there are two inputs as seen in the table above (let’s call the first column input1 and the second column input2), and based on the perceptron formula, we can calculate the output.

First, the linear combination will be the sum of the weighted inputs: linear_combination = weight1*input1 + weight2*input2 then we can put this value into the biased Heaviside step function, which will give us our output (0 or 1):

If you still need a hint, think of a concrete example like so:

Consider input1 and input2 both = 1, for an AND perceptron, we want the output to also equal 1! The output is determined by the weights and Heaviside step function such that

output = 1, if  weight1*input1 + weight2*input2 + bias >= 0
or
output = 0, if  weight1*input1 + weight2*input2 + bias < 0

So, how can you choose the values for weights and bias so that if both inputs = 1, the output = 1?

Gradient Descent

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/29PmNG7fuuM.mp4
Gradient is another term for rate of change or slope. If you need to brush up on this concept, check out Khan Academy’s great lectures on the topic.

Gradient Descent: The Math

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7sxA5Ap8AWM.mp4

Gradient Descent: The Code

Implementing Gradient Descent

Multilayer Perceptrons

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Rs9petvTBLk.mp4
Khan Academy’s introduction to vectors.
Khan Academy’s introduction to matrices.

Backpropagation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/MZL97-2joxQ.mp4

Implementing Backpropagation

Further Reading

From Andrej Karpathy: Yes, you should understand backprop
Also from Andrej Karpathy, a lecture from Stanford’s CS231n course

DEEP NEURAL NETWORKS

Two-Layer Neural Network

Multilayer Neural Networks

In this lesson, you’ll learn how to build multilayer neural networks with TensorFlow. Adding a hidden layer to a network allows it to model more complex functions. Also, using a non-linear activation function on the hidden layer lets it model non-linear functions.

We shall learn about ReLU, a non-linear function, or rectified linear unit. The ReLU function is 0 for negative inputs and x for all inputs x>0.

Next, you’ll see how a ReLU hidden layer is implemented in TensorFlow.

Quiz: TensorFlow ReLUs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf
output = None
hidden_layer_weights = [
[0.1, 0.2, 0.4],
[0.4, 0.6, 0.6],
[0.5, 0.9, 0.1],
[0.8, 0.2, 0.8]]
out_weights = [
[0.1, 0.6],
[0.2, 0.1],
[0.7, 0.9]]
# Weights and biases
weights = [
tf.Variable(hidden_layer_weights),
tf.Variable(out_weights)]
biases = [
tf.Variable(tf.zeros(3)),
tf.Variable(tf.zeros(2))]
# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])
# TODO: Create Model
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
# TODO: Print session results
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(logits))

Deep Neural Network in TensorFlow

Deep Neural Network in TensorFlow

You’ve seen how to build a logistic classifier using TensorFlow. Now you’re going to see how to use the logistic classifier to build a deep neural network.

Step by Step

In the following walkthrough, we’ll step through TensorFlow code written to classify the letters in the MNIST database. If you would like to run the network on your computer, the file is provided here. You can find this and many more examples of TensorFlow at Aymeric Damien’s GitHub repository.

Code

TensorFlow MNIST

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

You’ll use the MNIST dataset provided by TensorFlow, which batches and One-Hot encodes the data for you.

Learning Parameters

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

The focus here is on the architecture of multilayer neural networks, not parameter tuning, so here we’ll just give you the learning parameters.

Hidden Layer Parameters

n_hidden_layer = 256 # layer number of features

The variable n_hidden_layer determines the size of the hidden layer in the neural network. This is also known as the width of a layer.

Weights and Biases

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

Deep neural networks use multiple layers with each layer requiring it’s own weight and bias. The 'hidden_layer' weight and bias is for the hidden layer. The 'out' weight and bias is for the output layer. If the neural network were deeper, there would be weights and biases for each additional layer.

Input

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

The MNIST data is made up of 28px by 28px images with a single channel. The tf.reshape() function above reshapes the 28px by 28px matrices in x into row vectors of 784px.

Multilayer Perceptron

Multilayer Perceptron

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

You’ve seen the linear function tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])before, also known as xw + b. Combining linear functions together using a ReLU will give you a two layer network.

Optimizer

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

This is the same optimization technique used in the Intro to TensorFLow lab.

Session

# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

The MNIST library in TensorFlow provides the ability to receive the dataset in batches. Calling the mnist.train.next_batch() function returns a subset of the training data.

Deeper Neural Network

Deeper Neural Network
That’s it! Going from one layer to two is easy. Adding more layers to the network allows you to solve more complicated problems. In the next video, you’ll see how changing the number of layers can affect your network.

Training a Deep Learning Network

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/CsB7yUtMJyk.mp4

Save and Restore TensorFlow Models

Save and Restore TensorFlow Models

Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!

Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver. This class provides the functionality to save any tf.Variable to your file system.

Saving Variables

Let’s start with a simple example of saving weights and bias Tensors. For the first example you’ll just save two variables. Later examples will save all the weights in a practical model.

import tensorflow as tf

# The file path to save the data
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    saver.save(sess, save_file)

Weights:

[[-0.97990924 1.03016174 0.74119264]

[-0.82581609 -0.07361362 -0.86653847]]

Bias:

[ 1.62978125 -0.37812829 0.64723819]

The Tensors weights and bias are set to random values using the tf.truncated_normal() function. The values are then saved to the save_file location, “model.ckpt”, using the tf.train.Saver.save() function. (The “.ckpt” extension stands for “checkpoint”.)

If you’re using TensorFlow 0.11.0RC1 or newer, a file called “model.ckpt.meta” will also be created. This file contains the TensorFlow graph.

Loading Variables

Now that the Tensor Variables are saved, let’s load them back into a new model.

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

Weights:

[[-0.97990924 1.03016174 0.74119264]

[-0.82581609 -0.07361362 -0.86653847]]

Bias:

[ 1.62978125 -0.37812829 0.64723819]

You’ll notice you still need to create the weights and bias Tensors in Python. The tf.train.Saver.restore() function loads the saved data into weights and bias.

Since tf.train.Saver.restore() sets all the TensorFlow Variables, you don’t need to call tf.global_variables_initializer().

Save a Trained Model

Let’s see how to train a model and save its weights.

First start with a model:

# Remove previous Tensors and Operations
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Let’s train that model, then save the weights:

import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

Epoch 0 - Validation Accuracy: 0.06859999895095825

Epoch 10 - Validation Accuracy: 0.20239999890327454

Epoch 20 - Validation Accuracy: 0.36980000138282776

Epoch 30 - Validation Accuracy: 0.48820000886917114

Epoch 40 - Validation Accuracy: 0.5601999759674072

Epoch 50 - Validation Accuracy: 0.6097999811172485

Epoch 60 - Validation Accuracy: 0.6425999999046326

Epoch 70 - Validation Accuracy: 0.6733999848365784

Epoch 80 - Validation Accuracy: 0.6916000247001648

Epoch 90 - Validation Accuracy: 0.7113999724388123

Trained Model Saved.

Load a Trained Model

Let’s load the weights and bias from memory, then check the test accuracy.

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

Test Accuracy: 0.7229999899864197

That’s it! You now know how to save and load a trained model in TensorFlow. Let’s look at loading weights and biases into modified models in the next section.

Finetuning

Loading the Weights and Biases into a New Model

Sometimes you might want to adjust, or “finetune” a model that you have already trained and saved.

However, loading saved Variables directly into a modified model can generate errors. Let’s go over how to avoid these problems.

Naming Error

TensorFlow uses a string identifier for Tensors and Operations called name. If a name is not given, TensorFlow will create one automatically. TensorFlow will give the first node the name , and then give the name <Type>_<number> for the subsequent nodes. Let’s see how this can affect loading a model with a different order of weights and bias:

import tensorflow as tf

# Remove the previous weights and bias
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    saver.restore(sess, save_file)

The code above prints out the following:

Save Weights: Variable:0

Save Bias: Variable_1:0

Load Weights: Variable_1:0

Load Bias: Variable:0

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match.

You’ll notice that the name properties for weights and bias are different than when you saved the model. This is why the code produces the “Assign requires shapes of both tensors to match” error. The code saver.restore(sess, save_file) is trying to load weight data into bias and bias data into weights.

Instead of letting TensorFlow set the name property, let’s set it manually:

import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

Save Weights: weights_0:0

Save Bias: bias_0:0

Load Weights: weights_0:0

Load Bias: bias_0:0

Loaded Weights and Bias successfully.

That worked! The Tensor names match and the data loaded correctly.

Regularization Intro

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/pECnr-5F3_Q.mp4

Regularization

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/QcJBhbuCl5g.mp4

Regularization Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/E0eEW6V0_sA.mp4

Dropout

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6DcImJS8uV8.mp4

Dropout Pt. 2

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8nG8zzJMbZw.mp4

Quiz: TensorFlow Dropout

TensorFlow Dropout


https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (artificial neurons) from the network, along with all of those units’ incoming and outgoing connections. Figure 1 illustrates how dropout works.

TensorFlow provides the tf.nn.dropout() function, which you can use to implement dropout.

Let’s look at an example of how to use tf.nn.dropout().

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

The code above illustrates how to apply dropout to a neural network.

The tf.nn.dropout() function takes in two parameters:

  1. hidden_layer: the tensor to which you would like to apply dropout
  2. keep_prob: the probability of keeping (i.e. not dropping) any given unit

keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

During training, a good starting value for keep_prob is 0.5.

During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

Quiz 1

Take a look at the code snippet below. Do you see what’s wrong?

There’s nothing wrong with the syntax, however the test accuracy is extremely low.

...

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

...

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i in range(batches):
            ....

            sess.run(optimizer, feed_dict={
                features: batch_features,
                labels: batch_labels,
                keep_prob: 0.5})

    validation_accuracy = sess.run(accuracy, feed_dict={
        features: test_features,
        labels: test_labels,
        keep_prob: 0.5})

QUESTION 1 OF 2

What’s wrong with the above code?

Dropout doesn’t work with batching.

The keep_prob value of 0.5 is too low.

(correct)There shouldn’t be a value passed to keep_prob when testing for accuracy.
keep_prob should be set to 1.0 when evaluating validation accuracy.

Quiz 2

This quiz will be starting with the code from the ReLU Quiz and applying a dropout layer. Build a model with a ReLU layer and dropout layer using the keep_prob placeholder to pass in a probability of 0.5. Print the logits from the model.

Note: Output will be different every time the code is run. This is caused by dropout randomizing the units it drops.

“solution.py”

# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print logits from a session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits, feed_dict={keep_prob: 0.5}))

[[ 1.10000002 6.60000038]
[ 0.30800003 0.7700001 ]
[ 9.56000042 4.78000021]]

CONVOLUTIONAL NEURAL NETWORKS

Intro To CNNs

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B61jxZ4rkMs.mp4

Color

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/BdQccpMwk80.mp4

QUIZ QUESTION

What would be easier for your classifier to learn?

R, G, B
(correct)(R + G + B) / 3

Statistical Invariance

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0Hr5YwUUhr0.mp4

Convolutional Networks

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ISHGyvsT0QY.mp4

Intuition

Intuition

Let’s develop better intuition for how Convolutional Neural Networks (CNN) work. We’ll examine how humans classify images, and then see how CNNs use similar approaches.

Let’s say we wanted to classify the following image of a dog as a Golden Retriever.
An image that we'd like to classify as a Golden Retriever.

As humans, how do we do this?

One thing we do is that we identify certain parts of the dog, such as the nose, the eyes, and the fur. We essentially break up the image into smaller pieces, recognize the smaller pieces, and then combine those pieces to get an idea of the overall dog.

In this case, we might break down the image into a combination of the following:

  • A nose
  • Two eyes
  • Golden fur

These pieces can be seen below:
The eye of the dog.

The nose of the dog.

The fur of the dog.

Going One Step Further

But let’s take this one step further. How do we determine what exactly a nose is? A Golden Retriever nose can be seen as an oval with two black holes inside it. Thus, one way of classifying a Retriever’s nose is to to break it up into smaller pieces and look for black holes (nostrils) and curves that define an oval as shown below.
A curve that we can use to determine a nose.

A nostril that we can use to classify a nose of the dog.

Broadly speaking, this is what a CNN learns to do. It learns to recognize basic lines and curves, then shapes and blobs, and then increasingly complex objects within the image. Finally, the CNN classifies the image by combining the larger, more complex objects.

In our case, the levels in the hierarchy are:

  • Simple shapes, like ovals and dark circles
  • Complex objects (combinations of simple shapes), like eyes, nose, and fur
  • The dog as a whole (a combination of complex objects)

With deep learning, we don’t actually program the CNN to recognize these specific features. Rather, the CNN learns on its own to recognize such objects through forward propagation and backpropagation!

It’s amazing how well a CNN can learn to classify images, even though we never program the CNN with information about specific features to look for.
An example of what each layer in a CNN might recognize when classifying a picture of a dog.

A CNN might have several layers, and each layer might capture a different level in the hierarchy of objects. The first layer is the lowest level in the hierarchy, where the CNN generally classifies small parts of the image into simple shapes like horizontal and vertical lines and simple blobs of colors. The subsequent layers tend to be higher levels in the hierarchy and generally classify more complex ideas like shapes (combinations of lines), and eventually full objects like dogs.

Once again, the CNN learns all of this on its own. We don’t ever have to tell the CNN to go looking for lines or curves or noses or fur. The CNN just learns from the training set and discovers which characteristics of a Golden Retriever are worth looking for.

That’s a good start! Hopefully you’ve developed some intuition about how CNNs work.

Next, let’s look at some implementation details.

Filters

Breaking up an Image

The first step for a CNN is to break up the image into smaller pieces. We do this by selecting a width and height that defines a filter.

The filter looks at small pieces, or patches, of the image. These patches are the same size as the filter.
As shown in the previous video, a CNN uses filters to split an image into smaller patches. The size of these patches matches the filter size.

We then simply slide this filter horizontally or vertically to focus on a different piece of the image.

The amount by which the filter slides is referred to as the ‘stride’. The stride is a hyperparameter which you, the engineer, can tune. Increasing the stride reduces the size of your model by reducing the number of total patches each layer observes. However, this usually comes with a reduction in accuracy.

Let’s look at an example. In this zoomed in image of the dog, we first start with the patch outlined in red. The width and height of our filter define the size of this square.
One patch of the Golden Retriever image.

We then move the square over to the right by a given stride (2 in this case) to get another patch.
We move our square to the right by two pixels to create another patch.

What’s important here is that we are grouping together adjacent pixels and treating them as a collective.

In a normal, non-convolutional neural network, we would have ignored this adjacency. In a normal network, we would have connected every pixel in the input image to a neuron in the next layer. In doing so, we would not have taken advantage of the fact that pixels in an image are close together for a reason and have special meaning.

By taking advantage of this local structure, our CNN learns to classify local patterns, like shapes and objects, in an image.

Filter Depth

It’s common to have more than one filter. Different filters pick up different qualities of a patch. For example, one filter might look for a particular color, while another might look for a kind of object of a specific shape. The amount of filters in a convolutional layer is called the filter depth.
In the above example, a patch is connected to a neuron in the next layer. Source: MIchael Nielsen.

How many neurons does each patch connect to?

That’s dependent on our filter depth. If we have a depth of k, we connect each patch of pixels to k neurons in the next layer. This gives us the height of k in the next layer, as shown below. In practice, k is a hyperparameter we tune, and most CNNs tend to pick the same starting values.

Choosing a filter depth of k connects each patch to k neurons in the next layer.

But why connect a single patch to multiple neurons in the next layer? Isn’t one neuron good enough?

Multiple neurons can be useful because a patch can have multiple interesting characteristics that we want to capture.

For example, one patch might include some white teeth, some blonde whiskers, and part of a red tongue. In that case, we might want a filter depth of at least three - one for each of teeth, whiskers, and tongue.
This patch of the dog has many interesting features we may want to capture. These include the presence of teeth, the presence of whiskers, and the pink color of the tongue.

Having multiple neurons for a given patch ensures that our CNN can learn to capture whatever characteristics the CNN learns are important.

Remember that the CNN isn’t “programmed” to look for certain characteristics. Rather, it learns on its own which characteristics to notice.

Feature Map Sizes

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lp1NrLZnCUM.mp4

What are the width, height and depth for padding = ‘same’, stride = 1?

Enter your answers in the format “width, height, depth”
28,28,8

What are the width, height and depth for padding = ‘valid’, stride = 1?

Enter your answers in the format “width, height, depth”
26,26,8

What are the width, height and depth for padding = ‘valid’, stride = 2?

Enter your answers in the format “width, height, depth”
13,13,8

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/W4xtf8LTz1c.mp4

dropped

Convolutions continued

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/utOv-BKI_vo.mp4

Convolutions Cont.

Note, a “Fully Connected” layer is a standard, non convolutional layer, where all inputs are connected to all output neurons. This is also referred to as a “dense” layer, and is what we used in the previous two lessons.

Parameters

Parameter Sharing

The weights, `w`, are shared across patches for a given layer in a CNN to detect the cat above regardless of where in the image it is located.

The weights, w, are shared across patches for a given layer in a CNN to detect the cat above regardless of where in the image it is located.

When we are trying to classify a picture of a cat, we don’t care where in the image a cat is. If it’s in the top left or the bottom right, it’s still a cat in our eyes. We would like our CNNs to also possess this ability known as translation invariance. How can we achieve this?

As we saw earlier, the classification of a given patch in an image is determined by the weights and biases corresponding to that patch.

If we want a cat that’s in the top left patch to be classified in the same way as a cat in the bottom right patch, we need the weights and biases corresponding to those patches to be the same, so that they are classified the same way.

This is exactly what we do in CNNs. The weights and biases we learn for a given output layer are shared across all patches in a given input layer. Note that as we increase the depth of our filter, the number of weights and biases we have to learn still increases, as the weights aren’t shared across the output channels.

There’s an additional benefit to sharing our parameters. If we did not reuse the same weights across all patches, we would have to learn new parameters for every single patch and hidden layer neuron pair. This does not scale well, especially for higher fidelity images. Thus, sharing parameters not only helps us with translation invariance, but also gives us a smaller, more scalable model.

Padding


A 5x5 grid with a 3x3 filter. Source: Andrej Karpathy.

Let’s say we have a 5x5 grid (as shown above) and a filter of size 3x3 with a stride of 1. What’s the width and height of the next layer? We see that we can fit at most three patches in each direction, giving us a dimension of 3x3 in our next layer. As we can see, the width and height of each subsequent layer decreases in such a scheme.

In an ideal world, we’d be able to maintain the same width and height across layers so that we can continue to add layers without worrying about the dimensionality shrinking and so that we have consistency. How might we achieve this? One way is to simply add a border of 0s to our original 5x5 image. You can see what this looks like in the below image.

The same grid with 0 padding. Source: Andrej Karpathy.

This would expand our original image to a 7x7. With this, we now see how our next layer’s size is again a 5x5, keeping our dimensionality consistent.

Dimensionality

From what we’ve learned so far, how can we calculate the number of neurons of each layer in our CNN?

Given:

  • our input layer has a width of W and a height of H
  • our convolutional layer has a filter size F
  • we have a stride of S
  • a padding of P
  • and a filter depth of K,

the following formula gives us the width of the next layer: W_out = (W−F+2P)/S+1.

The output height would be H_out = (H-F+2P)/S + 1.

And the output depth would be equal to the filter depth D_out = K.

The output volume would be W_out * H_out * D_out.

Knowing the dimensionality of each additional layer helps us understand how large our model is and how our decisions around filter size and stride affect the size of our network.

Quiz: Convolution Output Shap

Introduction

For the next few quizzes we’ll test your understanding of the dimensions in CNNs. Understanding dimensions will help you make accurate tradeoffs between model size and performance. As you’ll see, some parameters have a much bigger impact on model size than others.

Setup

H = height, W = width, D = depth

  • We have an input of shape 32x32x3 (HxWxD)
  • 20 filters of shape 8x8x3 (HxWxD)
  • A stride of 2 for both the height and width (S)
  • Valid padding of size 1 ( P )

Recall the formula for calculating the new height or width:

new_height = (input_height - filter_height + 2 * P)/S + 1
new_width = (input_width - filter_width + 2 * P)/S + 1

Convolutional Layer Output Shape
What’s the shape of the output?

The answer format is HxWxD, so if you think the new height is 9, new width is 9, and new depth is 5, then type 9x9x5.

14x14x20

Solution: Convolution Output

Solution

The answer is 14x14x20.

We can get the new height and width with the formula resulting in:

(32 - 8 + 2 * 1)/2 + 1 = 14
(32 - 8 + 2 * 1)/2 + 1 = 14

The new depth is equal to the number of filters, which is 20.
This would correspond to the following code:

input = tf.placeholder(tf.float32, (None, 32, 32, 3))
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias

Note the output shape of conv will be [1, 13, 13, 20]. It’s 4D to account for batch size, but more importantly, it’s not [1, 14, 14, 20]. This is because the padding algorithm TensorFlow uses is not exactly the same as the one above. An alternative algorithm is to switch padding from 'VALID' to SAME which would result in an output shape of [1, 16, 16, 20]. If you’re curious how padding works in TensorFlow, read this document.

Quiz: Number of Parameters

We’re now going to calculate the number of parameters of the convolutional layer. The answer from the last quiz will come into play here!

Being able to calculate the number of parameters in a neural network is useful since we want to have control over how much memory a neural network uses.

Setup

H = height, W = width, D = depth

  • We have an input of shape 32x32x3 (HxWxD)
  • 20 filters of shape 8x8x3 (HxWxD)
  • A stride of 2 for both the height and width (S)
  • Valid padding of size 1 ( P )

Output Layer

  • 14x14x20 (HxWxD)

Hint

Without parameter sharing, each neuron in the output layer must connect to each neuron in the filter. In addition, each neuron in the output layer must also connect to a single bias neuron.

Solution: Number of Parameters

Solution

There are 756560 total parameters. That’s a HUGE amount! Here’s how we calculate it:

(8 * 8 * 3 + 1) * (14 * 14 * 20) = 756560

8 * 8 * 3 is the number of weights, we add 1 for the bias. Remember, each weight is assigned to every single part of the output (14 * 14 * 20). So we multiply these two numbers together and we get the final answer.

Quiz: Parameter Sharing

Now we’d like you to calculate the number of parameters in the convolutional layer, if every neuron in the output layer shares its parameters with every other neuron in its same channel.

This is the number of parameters actually used in a convolution layer (tf.nn.conv2d()).

Setup

H = height, W = width, D = depth

  • We have an input of shape 32x32x3 (HxWxD)
  • 20 filters of shape 8x8x3 (HxWxD)
  • A stride of 2 for both the height and width (S)
  • Zero padding of size 1 (P)

Output Layer

  • 14x14x20 (HxWxD)

Hint

With parameter sharing, each neuron in an output channel shares its weights with every other neuron in that channel. So the number of parameters is equal to the number of neurons in the filter, plus a bias neuron, all multiplied by the number of channels in the output layer.

Convolution Layer Parameters 2
How many parameters does the convolution layer have (with parameter sharing)?
3860

Solution: Parameter Sharing

Solution

There are 3860 total parameters. That’s 196 times fewer parameters! Here’s how the answer is calculated:

(8 * 8 * 3 + 1) * 20 = 3840 + 20 = 3860

That’s 3840 weights and 20 biases. This should look similar to the answer from the previous quiz. The difference being it’s just 20 instead of (14 * 14 * 20). Remember, with weight sharing we use the same filter for an entire depth slice. Because of this we can get rid of 14 * 14 and be left with only 20.

Visualizing CNNs

Visualizing CNNs

Let’s look at an example CNN to see how it works in action.

The CNN we will look at is trained on ImageNet as described in this paper by Zeiler and Fergus. In the images below (from the same paper), we’ll see what each layer in this network detects and see how each layer detects more and more complex ideas.

Layer 1

Example patterns that cause activations in the first layer of the network. These range from simple diagonal lines (top left) to green blobs (bottom middle).

Example patterns that cause activations in the first layer of the network. These range from simple diagonal lines (top left) to green blobs (bottom middle).

The images above are from Matthew Zeiler and Rob Fergus’ deep visualization toolbox, which lets us visualize what each layer in a CNN focuses on.

Each image in the above grid represents a pattern that causes the neurons in the first layer to activate - in other words, they are patterns that the first layer recognizes. The top left image shows a -45 degree line, while the middle top square shows a +45 degree line. These squares are shown below again for reference.
As visualized here, the first layer of the CNN can recognize -45 degree lines.

As visualized here, the first layer of the CNN can recognize -45 degree lines.

The first layer of the CNN is also able to recognize +45 degree lines, like the one above.

The first layer of the CNN is also able to recognize +45 degree lines, like the one above.

Let’s now see some example images that cause such activations. The below grid of images all activated the -45 degree line. Notice how they are all selected despite the fact that they have different colors, gradients, and patterns.
Example patches that activate the -45 degree line detector in the first layer.

Example patches that activate the -45 degree line detector in the first layer.


So, the first layer of our CNN clearly picks out very simple shapes and patterns like lines and blobs(斑点).

Layer 2

A visualization of the second layer in the CNN. Notice how we are picking up more complex ideas like circles and stripes. The gray grid on the left represents how this layer of the CNN activates (or "what it sees") based on the corresponding images from the grid on the right.

A visualization of the second layer in the CNN. Notice how we are picking up more complex ideas like circles and stripes. The gray grid on the left represents how this layer of the CNN activates (or “what it sees”) based on the corresponding images from the grid on the right.

The second layer of the CNN captures complex ideas.

As you see in the image above, the second layer of the CNN recognizes circles (second row, second column), stripes (first row, second column), and rectangles (bottom right).

The CNN learns to do this on its own. There is no special instruction for the CNN to focus on more complex objects in deeper layers. That’s just how it normally works out when you feed training data into a CNN.

Layer 3

A visualization of the third layer in the CNN. The gray grid on the left represents how this layer of the CNN activates (or "what it sees") based on the corresponding images from the grid on the right.

A visualization of the third layer in the CNN. The gray grid on the left represents how this layer of the CNN activates (or “what it sees”) based on the corresponding images from the grid on the right.

The third layer picks out complex combinations of features from the second layer. These include things like grids, and honeycombs (top left), wheels (second row, second column), and even faces (third row, third column).

Layer 5

A visualization of the fifth and final layer of the CNN. The gray grid on the left represents how this layer of the CNN activates (or "what it sees") based on the corresponding images from the grid on the right.

A visualization of the fifth and final layer of the CNN. The gray grid on the left represents how this layer of the CNN activates (or “what it sees”) based on the corresponding images from the grid on the right.

We’ll skip layer 4, which continues this progression, and jump right to the fifth and final layer of this CNN.

The last layer picks out the highest order ideas that we care about for classification, like dog faces, bird faces, and bicycles.

On to TensorFlow

This concludes our high-level discussion of Convolutional Neural Networks.

Next you’ll practice actually building these networks in TensorFlow.

TensorFlow Convolution Layer

TensorFlow Convolution Layer

Let’s examine how to implement a CNN in TensorFlow.

TensorFlow provides the tf.nn.conv2d() and tf.nn.bias_add() functions to create your own convolutional layers.

# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

The code above uses the tf.nn.conv2d() function to compute the convolution with weight as the filter and [1, 2, 2, 1] for the strides. TensorFlow uses a stride for each input dimension,[batch, input_height, input_width, input_channels]. We are generally always going to set the stride for batch and input_channels (i.e. the first and fourth element in the strides array) to be 1.

You’ll focus on changing input_height and input_width while setting batch and input_channels to 1. The input_height and input_width strides are for striding the filter over input. This example code uses a stride of 2 with 5x5 filter over input.

The tf.nn.bias_add() function adds a 1-d bias to the last dimension in a matrix.

Explore The Design Space

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/FG7M9tWH2nQ.mp4

TensorFlow Max Pooling

TensorFlow Max Pooling

By Aphex34 (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

By Aphex34 (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons


The image above is an example of max pooling with a 2x2 filter and stride of 2. The four 2x2 colors represent each time the filter was applied to find the maximum value.

For example, [[1, 0], [4, 6]] becomes 6, because 6 is the maximum value in this set. Similarly, [[2, 3], [6, 8]] becomes 8.

Conceptually, the benefit of the max pooling operation is to reduce the size of the input, and allow the neural network to focus on only the most important elements. Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values.

TensorFlow provides the tf.nn.max_pool() function to apply max pooling to your convolutional layers.

...
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')

The tf.nn.max_pool() function performs max pooling with the ksize parameter as the size of the filter and the strides parameter as the length of the stride. 2x2 filters with a stride of 2x2 are common in practice.

The ksize and strides parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). For both ksize and strides, the batch and channel dimensions are typically set to 1.

Quiz: Pooling Intuition

The next few quizzes will test your understanding of pooling layers.

QUIZ QUESTION

A pooling layer is generally used to …

Increase the size of the output
(correct)Decrease the size of the output
(correct)Prevent overfitting

Gain information

Solution: Pooling Intuition

Solution

The correct answer is decrease the size of the output and prevent overfitting. Preventing overfitting is a consequence of reducing the output size, which in turn, reduces the number of parameters in future layers.
Recently, pooling layers have fallen out of favor. Some reasons are:

  • Recent datasets are so big and complex we’re more concerned about underfitting.
  • Dropout is a much better regularizer.
  • Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the largest of n numbers, thereby disregarding n-1 numbers completely.

Quiz: Pooling Mechanics

Setup

H = height, W = width, D = depth

We have an input of shape 4x4x5 (HxWxD)
Filter of shape 2x2 (HxW)
A stride of 2 for both the height and width (S)

Recall the formula for calculating the new height or width:

new_height = (input_height - filter_height)/S + 1
new_width = (input_width - filter_width)/S + 1

NOTE: For a pooling layer the output depth is the same as the input depth. Additionally, the pooling operation is applied individually for each depth slice.

The image below gives an example of how a max pooling layer works. In this case, the max pooling filter has a shape of 2x2. As the max pooling filter slides across the input layer, the filter will output the maximum value of the 2x2 square.


Pooling Layer Output Shape


What’s the shape of the output? Format is HxWxD.
2x2x5

Solution: Pooling Mechanics

Solution

The answer is 2x2x5. Here’s how it’s calculated using the formula:

(4 - 2)/2 + 1 = 2
(4 - 2)/2 + 1 = 2

The depth stays the same.
Here’s the corresponding code:

input = tf.placeholder(tf.float32, (None, 4, 4, 5))
filter_shape = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
pool = tf.nn.max_pool(input, filter_shape, strides, padding)

The output shape of pool will be [1, 2, 2, 5], even if padding is changed to 'SAME'.

Quiz: Pooling Practice

Great, now let’s practice doing some pooling operations manually.

Max Pooling
What’s the result of a max pooling operation on the input:

[[[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]]]
Assume the filter is 2x2 and the stride is 2 for both height and width. The output shape is 2x2x1.

The answering format will be 4 numbers, each separated by a comma, such as: 1,2,3,4.

Work from the top left to the bottom right

Enter your response here
SUBMIT


NEXT

Solution: Pooling Practice

Solution

The correct answer is 2.5,10,15,6. We start with the four numbers in the top left corner. Then we work left-to-right and top-to-bottom, moving 2 units each time.

max(0, 1, 2, 2.5) = 2.5
max(0.5, 10, 1, -8) = 10
max(4, 0, 15, 1) = 15
max(5, 6, 2, 3) = 6

Quiz: Average Pooling

Mean Pooling
What’s the result of a average (or mean) pooling?

[[[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]]]
Assume the filter is 2x2 and the stride is 2 for both height and width. The output shape is 2x2x1.

The answering format will be 4 numbers, each separated by a comma, such as: 1,2,3,4.

Answer to 3 decimal places. Work from the top left to the bottom right

Solution: Average Pooling

Solution

The correct answer is 1.375,0.875,5,4. We start with the four numbers in the top left corner. Then we work left-to-right and top-to-bottom, moving 2 units each time.

mean(0, 1, 2, 2.5) = 1.375
mean(0.5, 10, 1, -8) = 0.875
mean(4, 0, 15, 1) = 5
mean(5, 6, 2, 3) = 4

1x1 Convolutions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Zmzgerm6SjA.mp4

Inception Module

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/SlTm03bEOxA.mp4

Convolutional Network in TensorFlow

Convolutional Network in TensorFlow

It’s time to walk through an example Convolutional Neural Network (CNN) in TensorFlow.

The structure of this network follows the classic structure of CNNs, which is a mix of convolutional layers and max pooling, followed by fully-connected layers.

The code you’ll be looking at is similar to what you saw in the segment on Deep Neural Network in TensorFlow, except we restructured the architecture of this network as a CNN.

Just like in that segment, here you’ll study the line-by-line breakdown of the code. If you want, you can even download the code and run it yourself.

Thanks to Aymeric Damien for providing the original TensorFlow model on which this segment is based.

Time to dive in!

Dataset

You’ve seen this section of code from previous lessons. Here we’re importing the MNIST dataset and using a convenient TensorFlow function to batch, scale, and One-Hot encode the data.

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.00001
epochs = 10
batch_size = 128

# Number of samples to calculate validation and accuracy
# Decrease this if you're running out of memory to calculate accuracy
test_valid_size = 256

# Network Parameters
n_classes = 10  # MNIST total classes (0-9 digits)
dropout = 0.75  # Dropout, probability to keep units

Weights and Biases

# Store layers weight & bias
weights = {
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    'out': tf.Variable(tf.random_normal([1024, n_classes]))}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))}

Convolutions

Convolution with 3×3 Filter.  Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

Convolution with 3×3 Filter. Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

The above is an example of a convolution with a 3x3 filter and a stride of 1 being applied to data with a range of 0 to 1. The convolution for each 3x3 section is calculated against the weight, [[1, 0, 1], [0, 1, 0], [1, 0, 1]], then a bias is added to create the convolved feature on the right. In this case, the bias is zero. In TensorFlow, this is all done using tf.nn.conv2d() and tf.nn.bias_add().
def conv2d(x, W, b, strides=1):
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
The tf.nn.conv2d() function computes the convolution against weight W as shown above.

In TensorFlow, strides is an array of 4 elements; the first element in this array indicates the stride for batch and last element indicates stride for features. It’s good practice to remove the batches or features you want to skip from the data set rather than use a stride to skip them. You can always set the first and last element to 1 in strides in order to use all batches and features.

The middle two elements are the strides for height and width respectively. I’ve mentioned stride as one number because you usually have a square stride where height = width. When someone says they are using a stride of 3, they usually mean tf.nn.conv2d(x, W, strides=[1, 3, 3, 1]).

To make life easier, the code is using tf.nn.bias_add() to add the bias. Using tf.add() doesn’t work when the tensors aren’t the same shape.

Max Pooling

Max Pooling with 2x2 filter and stride of 2.  Source: http://cs231n.github.io/convolutional-networks/

Max Pooling with 2x2 filter and stride of 2. Source: http://cs231n.github.io/convolutional-networks/

The above is an example of max pooling with a 2x2 filter and stride of 2. The left square is the input and the right square is the output. The four 2x2 colors in input represents each time the filter was applied to create the max on the right side. For example, [[1, 1], [5, 6]] becomes 6 and [[3, 2], [1, 2]] becomes 3.
def maxpool2d(x, k=2):
return tf.nn.max_pool(
x,
ksize=[1, k, k, 1],
strides=[1, k, k, 1],
padding=’SAME’)
The tf.nn.max_pool() function does exactly what you would expect, it performs max pooling with the ksize parameter as the size of the filter.

Model

Image from Explore The Design Space video

Image from Explore The Design Space video

In the code below, we’re creating 3 layers alternating between convolutions and max pooling followed by a fully connected and output layer. The transformation of each layer to new dimensions are shown in the comments. For example, the first layer shapes the images from 28x28x1 to 28x28x32 in the convolution step. Then next step applies max pooling, turning each sample into 14x14x32. All the layers are applied from conv1 to output, producing 10 class predictions.

def conv_net(x, weights, biases, dropout):
    # Layer 1 - 28*28*1 to 14*14*32
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    conv1 = maxpool2d(conv1, k=2)

    # Layer 2 - 14*14*32 to 7*7*64
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer - 7*7*64 to 1024
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output Layer - class prediction - 1024 to 10
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

Session

Now let’s run it!

# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)

# Model
logits = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf. global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(epochs):
        for batch in range(mnist.train.num_examples//batch_size):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: dropout})

            # Calculate batch loss and accuracy
            loss = sess.run(cost, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: 1.})
            valid_acc = sess.run(accuracy, feed_dict={
                x: mnist.validation.images[:test_valid_size],
                y: mnist.validation.labels[:test_valid_size],
                keep_prob: 1.})

            print('Epoch {:>2}, Batch {:>3} -'
                  'Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(
                epoch + 1,
                batch + 1,
                loss,
                valid_acc))

    # Calculate Test Accuracy
    test_acc = sess.run(accuracy, feed_dict={
        x: mnist.test.images[:test_valid_size],
        y: mnist.test.labels[:test_valid_size],
        keep_prob: 1.})
    print('Testing Accuracy: {}'.format(test_acc))

That’s it! That is a CNN in TensorFlow. Now that you’ve seen a CNN in TensorFlow, let’s see if you can apply it on your own!

TensorFlow Convolution Layer

Using Convolution Layers in TensorFlow

Let’s now apply what we’ve learned to build real CNNs in TensorFlow. In the below exercise, you’ll be asked to set up the dimensions of the Convolution filters, the weights, the biases. This is in many ways the trickiest part to using CNNs in TensorFlow. Once you have a sense of how to set up the dimensions of these attributes, applying CNNs will be far more straight forward.

Review

You should go over the TensorFlow documentation for 2D convolutions. Most of the documentation is straightforward, except perhaps the padding argument. The padding might differ depending on whether you pass 'VALID' or 'SAME'.

Here are a few more things worth reviewing:

Introduction to TensorFlow -> TensorFlow Variables.
How to determine the dimensions of the output based on the input size and the filter size (shown below). You’ll use this to determine what the size of your filter should be.
new_height = (input_height - filter_height + 2 P)/S + 1
new_width = (input_width - filter_width + 2
P)/S + 1

Instructions

  1. Finish off each TODO in the conv2d function.
  2. Setup the strides, padding and filter weight/bias (F_w and F_b) such that the output shape is (1, 2, 2, 3). Note that all of these except strides should be TensorFlow variables.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
"""
Setup the strides, padding and filter weight/bias such that
the output shape is (1, 2, 2, 3).
"""
import tensorflow as tf
import numpy as np
# `tf.nn.conv2d` requires the input be 4D (batch_size, height, width, depth)
# (1, 4, 4, 1)
x = np.array([
[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]], dtype=np.float32).reshape((1, 4, 4, 1))
X = tf.constant(x)
def conv2d(input):
# Filter (weights and bias)
# The shape of the filter weight is (height, width, input_depth, output_depth)
# The shape of the filter bias is (output_depth,)
# TODO: Define the filter weights `F_W` and filter bias `F_b`.
# NOTE: Remember to wrap them in `tf.Variable`, they are trainable parameters after all.
F_W = ?
F_b = ?
# TODO: Set the stride for each dimension (batch_size, height, width, depth)
strides = [?, ?, ?, ?]
# TODO: set the padding, either 'VALID' or 'SAME'.
padding = ?
# https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#conv2d
# `tf.nn.conv2d` does not include the bias computation so we have to add it ourselves after.
return tf.nn.conv2d(input, F_W, strides, padding) + F_b
out = conv2d(X)

Solution: TensorFlow Convolution Layer

Solution

Here’s how I did it. NOTE: there’s more than 1 way to get the correct output shape. Your answer might differ from mine.
def conv2d(input):

# Filter (weights and bias)
F_W = tf.Variable(tf.truncated_normal((2, 2, 1, 3)))
F_b = tf.Variable(tf.zeros(3))
strides = [1, 2, 2, 1]
padding = 'VALID'
return tf.nn.conv2d(input, F_W, strides, padding) + F_b

I want to transform the input shape (1, 4, 4, 1) to (1, 2, 2, 3). I choose ‘VALID’ for the padding algorithm. I find it simpler to understand and it achieves the result I’m looking for.

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Plugging in the values:

out_height = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2
out_width  = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2

In order to change the depth from 1 to 3, I have to set the output depth of my filter appropriately:

F_W = tf.Variable(tf.truncated_normal((2, 2, 1, 3))) # (height, width, input_depth, output_depth)
F_b = tf.Variable(tf.zeros(3)) # (output_depth)

The input has a depth of 1, so I set that as the input_depth of the filter.

TensorFlow Pooling Layer

Using Pooling Layers in TensorFlow

In the below exercise, you’ll be asked to set up the dimensions of the pooling filters, strides, as well as the appropriate padding. You should go over the TensorFlow documentation for tf.nn.max_pool(). Padding works the same as it does for a convolution.

Instructions

Finish off each TODO in the maxpool function.
Setup the strides, padding and ksize such that the output shape after pooling is (1, 2, 2, 1).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
Set the values to `strides` and `ksize` such that
the output shape after pooling is (1, 2, 2, 1).
"""
import tensorflow as tf
import numpy as np
# `tf.nn.max_pool` requires the input be 4D (batch_size, height, width, depth)
# (1, 4, 4, 1)
x = np.array([
[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]], dtype=np.float32).reshape((1, 4, 4, 1))
X = tf.constant(x)
def maxpool(input):
# TODO: Set the ksize (filter size) for each dimension (batch_size, height, width, depth)
ksize = [?, ?, ?, ?]
# TODO: Set the stride for each dimension (batch_size, height, width, depth)
strides = [?, ?, ?, ?]
# TODO: set the padding, either 'VALID' or 'SAME'.
padding = ?
# https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#max_pool
return tf.nn.max_pool(input, ksize, strides, padding)
out = maxpool(X)

Solution: TensorFlow Pooling Layer

Solution

Here’s how I did it. NOTE: there’s more than 1 way to get the correct output shape. Your answer might differ from mine.
def maxpool(input):
ksize = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = ‘VALID’
return tf.nn.max_pool(input, ksize, strides, padding)
I want to transform the input shape (1, 4, 4, 1) to (1, 2, 2, 1). I choose 'VALID' for the padding algorithm. I find it simpler to understand and it achieves the result I’m looking for.

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Plugging in the values:

out_height = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2
out_width  = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2

The depth doesn’t change during a pooling operation so I don’t have to worry about that.

CNNs - Additional Resources

Additional Resources

There are many wonderful free resources that allow you to go into more depth around Convolutional Neural Networks. In this course, our goal is to give you just enough intuition to start applying this concept on real world problems so you have enough of an exposure to explore more on your own. We strongly encourage you to explore some of these resources more to reinforce your intuition and explore different ideas.

These are the resources we recommend in particular:

  • Andrej Karpathy’s CS231n Stanford course on Convolutional Neural Networks.
  • Michael Nielsen’s free book on Deep Learning.
  • Goodfellow, Bengio, and Courville’s more advanced free book on Deep Learning.

DEEP LEARNING PROJECT

Project Details

Introduction to the Project

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/awEYy2Df3hg.mp4

Starting the Project

Starting the Project

For this assignment, you can find the image_classification folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!

This project contains 3 files:

  • image_classification.ipynb: This is the main file where you will be performing your work on the project.
  • Two helper files, problem_unittests.py and helper.py

Submitting the Project

Submitting the Project
Evaluation

Your project will be reviewed by a Udacity reviewer against the Object Classification Program project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named image_recognition for ease of access:

  • The image_classification.ipynb notebook file with all questions answered and all code cells executed and displaying output along with the .html version of the notebook.
  • All helper files.

Once you have collected these files and reviewed the project rubric, proceed to the project submission page.

PROJECT

Implement this project

  1. download this file from github
  2. open virtualbox
  3. copy downloaded file to my shared file C:\Users\SSQ\virtualbox share
  4. type sudo mount -t vboxsf virtualbox_share /mnt/ in ubuntu terminal
  5. type jupyter notebook image_classification.ipynb in the right directory
    error
    ImportError: No module named request

(failed)try with anaconda3 in ubuntu

  1. download anaconda3 from this web
  2. type ./Anaconda3-4.3.1-Linux-x86_64.sh in terminal to run sh file
  3. Anaconda3 will now be installed into this location:/home/ssq/anaconda3

    installation finished.
    Do you wish the installer to prepend the Anaconda3 install location
    to PATH in your /home/ssq/.bashrc ? [yes|no]
    [no] >>>
    You may wish to edit your .bashrc or prepend the Anaconda3 install location:

    $ export PATH=/home/ssq/anaconda3/bin:$PATH

    Thank you for installing Anaconda3!

    Share your notebooks and packages on Anaconda Cloud!
    Sign up for free: https://anaconda.org

  4. export PATH=/home/ssq/anaconda3/bin:$PATH in your .ipynb location
  5. conda create -n tensorflow
  6. error:
    ModuleNotFoundError: No module named 'tqdm'
    method:
    conda install -c conda-forge tqdm
    Package plan for installation in environment /home/ssq/anaconda3:

    The following NEW packages will be INSTALLED:

    tqdm:      4.11.2-py36_0 conda-forge
    

    The following packages will be SUPERCEDED by a higher-priority channel:

    conda:     4.3.14-py36_0             --> 4.2.13-py36_0 conda-forge
    conda-env: 2.6.0-0                   --> 2.6.0-0       conda-forge
    

    Proceed ([y]/n)? y

  7. Anaconda installation
  8. export PATH=/home/ssq/anaconda3/bin:$PATH
  9. source activate tensorflow
  10. 1
    2
    3
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
    conda config --set show_channel_urls yes
    conda install tensorflow
  11. conda install -c conda-forge tensorflow

    1
    2
    conda install pandas matplotlib jupyter notebook scipy scikit-learn
    pip install tensorflow
  12. pip3 install --upgrade pip

(failed)pip3 install tensorflow in ubuntu

  1. from this web to create a new vb
  2. 安装增强 剪切板双向
  3. set the shared file from this blog and type sudo mount -t vboxsf virtualbox_share /mnt/
  4. type sudo apt install python3-pip in terminal to install python3
  5. pip3 install tensorflow
  6. python3 and test

(success)anaconda3 install in Win7 tensorflow

  1. download anaconda3 from this web
  2. 1
    2
    3
    4
    conda create -n tensorflow python=3.5
    activate tensorflow
    conda install pandas matplotlib jupyter notebook scipy scikit-learn
    pip install tensorflow
  3. (tensorflow) C:\Users\SSQ>cd C:\Users\SSQ\virtualbox share\image-classification

  4. (tensorflow) C:\Users\SSQ\virtualbox share\image-classification>jupyter notebook image_classification.ipynb
  5. ModuleNotFoundError: No module named 'tqdm'
    method:
    (tensorflow) C:\Users\SSQ\virtualbox share\image-classification>conda install tqdm

anaconda3 install in win7 tensorflow-gpu

  1. view this page and this blog
  2. cuda_8.0.61_windows in win7
  3. cudnn-8.0-windows7-x64-v6.0
  4. 1
    2
    3
    4
    conda create -n tensorflow-gpu python=3.5
    activate tensorflow-gpu
    conda install pandas matplotlib jupyter notebook scipy scikit-learn
    pip install tensorflow-gpu

Submission

Image Classification

Project Submission

Image Classification

Introduction

In this project, you’ll classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects. The dataset will need to be preprocessed, then train a convolutional neural network on all the samples. You’ll normalize the images, one-hot encode the labels, build a convolutional layer, max pool layer, and fully connected layer. At then end, you’ll see their predictions on the sample images.

Getting the project files

The project files can be found in our public GitHub repo, in the image-classification folder. You can download the files from there, but it’s better to clone the repository to your computer

This way you can stay up to date with any changes we make by pulling the changes to your local repository with git pull.

Submission
  1. Ensure you’ve passed all the unit tests in the notebook.
  2. Ensure you pass all points on the rubric.
  3. When you’re done with the project, please save the notebook as an HTML file. You can do this by going to the File menu in the notebook and choosing “Download as” > HTML. Ensure you submit both the Jupyter Notebook and it’s HTML version together.
  4. Package the “dlnd_image_classification.ipynb”, “helper.py”, “problem_unittests.py”, and the HTML file into a zip archive, or push the files from your GitHub repo.
  5. Hit Submit Project below!

Submit Your Project

submit
view submission
reference

Career: Interview Practice

Machine Learning Specializations

Capstone Proposal

PROJECT

Writing up a Capstone proposal

Overview

Capstone Proposal Overview

Please note that once your Capstone Proposal has been submitted and you have passed the evaluation, you have to submit your Capstone project using the same proposal that you submitted. We do not allow the Capstone Proposal and the Capstone project to differ in terms of dataset and approach.

In this capstone project proposal, prior to completing the following Capstone Project, you you will leverage what you’ve learned throughout the Nanodegree program to author a proposal for solving a problem of your choice by applying machine learning algorithms and techniques. A project proposal encompasses seven key points:

  • The project’s domain background — the field of research where the project is derived;
  • A problem statement — a problem being investigated for which a solution will be defined;
  • The datasets and inputs — data or inputs being used for the problem;
  • A solution statement — a the solution proposed for the problem given;
  • A benchmark model — some simple or historical model or result to compare the defined solution to;
  • A set of evaluation metrics — functional representations for how the solution can be measured;
  • An outline of the project design — how the solution will be developed and results obtained.
Capstone Proposal Highlights

The capstone project proposal is designed to introduce you to writing proposals for major projects. Typically, before you begin working on a solution to a problem, a proposal is written to your peers, advisor, manager, etc., to outline the details of the problem, your research, and your approach to a solution.

Things you will learn by completing this project proposal:

  • How to research a real-world problem of interest.
  • How to author a technical proposal document.
  • How to organize a proposed workflow for designing a solution.

Description

Capstone Proposal Description

Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. Below are a few suggested problem areas you could explore if you are unsure what your passion is:

In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.

To determine whether your project and the problem you want to solve fits Udacity’s vision of a Machine Learning Capstone Project , please refer to the capstone proposal rubric and the capstone project rubric and make a note of each rubric criteria you will be evaluated on. A satisfactory project will have a proposal that clearly satisfies these requirements.

Software and Data Requirements

Software Requirements

Your proposed project must be written in Python 2.7. Given the free-form nature of the machine learning capstone, the software and libraries you will need to successfully complete your work will vary depending on the chosen application area and problem definition. Because of this, it is imperative that all necessary software and libraries you consider using in your capstone project are accessible clearly documented. Please note that proprietary software, software that requires private licenses, or software behind a paywall or login account should be avoided.

Data Requirements

Every machine learning capstone project will most certainly require some form of dataset or input data structure (input text files, images, etc.). Similar to the software requirements above, the data you are considering must either be publicly accessible or provided by you during the submission process, and private or proprietary data should not be used without expressed permission. Please take into consideration the file size of your data — while there is no strict upper limit, input files that are excessively large may require reviewers longer than an acceptable amount of time to acquire all of your project files. This can take away from the reviewer’s time that could be put towards evaluating your proposal. If the data you are considering fits the criteria of being too large, consider whether you could work with a subset of the data instead, or provide a representative sample of the data.

Ethics

Udacity’s A/B Testing course, as part of the Data Analyst Nanodegree, has a segment that discusses the sensitivity of data and the expectation of privacy from those whose information has been collected. While most data you find available to the public will not have any ethical complications, it is extremely important that you are considering where the data you are using came from, and whether that data contains any sensitive information. For example, if you worked for a bank and wanted to use customers’ bank statements as part of your project, this would most likely be an unethical choice of data and should be avoided.

If you have any questions regarding the nature of a dataset or software you intend to use for the capstone project, please send an email to machine-support@udacity.com with the subject “Capstone Project Dataset/Software Inquiry”.

Proposal Guidelines

Report Guidelines

Your project submission will be evaluated on the written proposal that is submitted. Additionally, depending on the project you are proposing, other materials such as the data being used will be evaluated. It is expected that the proposal contains enough detail, documentation, analysis, and discussion to adequately reflect the work you intend to complete for the project. Because of this, it is extremely important that the proposal is written in a professional, standardized way, so those who review your project’s proposal are able to clearly identify each component of your project in the report. Without a properly written proposal, your project cannot be sufficiently evaluated. A project proposal template is provided for you to understand how a project proposal should be structured. We strongly encourage students to have a proposal that is approximately two to three pages in length.

The Machine Learning Capstone Project proposal should be treated no different than a written research paper for academics. Your goal is to ultimately present the research you’ve discovered into the respective problem domain you’ve chosen, and then clearly articulate your intended project to your peers. The narrative found in the project proposal template provides for a “proposal checklist” that will aid you in fully completing a documented proposal. Please make use of this resource!

Submitting the Project

Evaluation

Your project will be reviewed by a Udacity reviewer against the Capstone Project Proposal rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip), please take into consideration the total file size. You will need to include

  • A project proposal, in PDF format only, with the name proposal.pdf, addressing each of the seven key points of a proposal. The recommended page length for a proposal is approximately two to three pages.
  • Any additional supporting material such as datasets, images, or input files that are necessary for your project and proposal. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in an included README.md file.
    Once you have collected these files and reviewed the project rubric, proceed to the project submission page.

Submission

Capstone Proposal

Project Submission

In this capstone project proposal, prior to completing the following Capstone Project, you you will leverage what you’ve learned throughout the Nanodegree program to author a proposal for solving a problem of your choice by applying machine learning algorithms and techniques. A project proposal encompasses seven key points:

  • The project’s domain background — the field of research where the project is derived;
  • A problem statement — a problem being investigated for which a solution will be defined;
  • The datasets and inputs — data or inputs being used for the problem;
  • A solution statement — a the solution proposed for the problem given;
  • A benchmark model — some simple or historical model or result to compare the defined solution to;
  • A set of evaluation metrics — functional representations for how the solution can be measured;
  • An outline of the project design — how the solution will be developed and results obtained.

Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. Below are a few suggested problem areas you could explore if you are unsure what your passion is:

In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.

Evaluation

Your project will be reviewed by a Udacity reviewer against the Capstone Project Proposal rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip), please take into consideration the total file size. You will need to include

  • A project proposal, in PDF format only, with the name proposal.pdf, addressing each of the seven key points of a proposal. The recommended page length for a proposal is approximately two to three pages.
  • Any additional supporting material such as datasets, images, or input files that are necessary for your project and proposal. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in an included README.md file.
    Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
I’m Ready!

When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.

What’s Next?

You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!

submit

Capstone Project

PROJECT

Machine Learning Capstone Project

Overview

Capstone Project Overview

In this capstone project, you will leverage what you’ve learned throughout the Nanodegree program to solve a problem of your choice by applying machine learning algorithms and techniques. You will first define the problem you want to solve and investigate potential solutions and performance metrics. Next, you will analyze the problem through visualizations and data exploration to have a better understanding of what algorithms and features are appropriate for solving it. You will then implement your algorithms and metrics of choice, documenting the preprocessing, refinement, and postprocessing steps along the way. Afterwards, you will collect results about the performance of the models used, visualize significant quantities, and validate/justify these values. Finally, you will construct conclusions about your results, and discuss whether your implementation adequately solves the problem.

Capstone Project Highlights

This project is designed to prepare you for delivering a polished, end-to-end solution report of a real-world problem in a field of interest. When developing new technology, or deriving adaptations of previous technology, properly documenting your process is critical for both validating and replicating your results.

Things you will learn by completing this project:

  • How to research and investigate a real-world problem of interest.
  • How to accurately apply specific machine learning algorithms and techniques.
  • How to properly analyze and visualize your data and results for validity.
  • How to document and write a report of your work.

Description

Capstone Description

Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as data sets) to complete this project, and make the appropriate citations wherever necessary in your report. Below are a few suggested problem areas you could explore if you are unsure what your passion is:

In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.

Note: For students who have enrolled before October 17th, we strongly encourage that you look at the Capstone Proposal project that is available as an elective before this project. If you have an idea for your capstone project but aren’t ready to begin working on the implementation, or even if you want to get feedback on how you will approach a solution to your problem, you can use the Capstone Proposal project to have a peer-review from one of our Capstone Project reviewers!

For whichever application area or problem you ultimately investigate, there are five major stages to this capstone project which you will move through and subsequently document. Each stage plays a significant role in the development life cycle of beginning with a problem definition and finishing with a polished, working solution. As you make your way through developing your project, be sure that you are also working on a rough draft of your project report, as it is the most important aspect to your submission!

To determine whether your project and the problem you want to solve fits Udacity’s vision of a Machine Learning Capstone Project , please refer to the capstone project rubric and make a note of each rubric criteria you will be evaluated on. A satisfactory project will have a report that encompasses each stage and component of the rubric.

Software and Data Requirements

Software Requirements

Your project must be written in Python 2.7. Given the free-form nature of the machine learning capstone, the software and libraries you will need to successfully complete your work will vary depending on the chosen application area and problem definition. Because of this, it is imperative that all necessary software and libraries used in your capstone project are accessible to the reviewer and clearly documented. Information regarding the software and libraries your project makes use of should be included in the README along with your submission. Please note that proprietary software, software that requires private licenses, or software behind a paywall or login account should be avoided.

Data Requirements

Every machine learning capstone project will most certainly require some form of dataset or input data structure (input text files, images, etc.). Similar to the software requirements above, the data you use must either be publicly accessible or provided by you during the submission process, and private or proprietary data should not be used without expressed permission. Please take into consideration the file size of your data — while there is no strict upper limit, input files that are excessively large may require reviewers longer than an acceptable amount of time to acquire all of your project files and/or execute the provided development code. This can take away from the reviewer’s time that could be put towards evaluating your submission. If the data you are working with fits the criteria of being too large, consider whether you can work with a subset of the data instead, or provide a representative sample of the data which the reviewer may use to verify the solution explored in the project.

Ethics

Udacity’s A/B Testing course, as part of the Data Analyst Nanodegree, has a segment that discusses the sensitivity of data and the expectation of privacy from those whose information has been collected. While most data you find available to the public will not have any ethical complications, it is extremely important that you are considering where the data you are using came from, and whether that data contains any sensitive information. For example, if you worked for a bank and wanted to use customers’ bank statements as part of your project, this would most likely be an unethical choice of data and should be avoided.

Report Guidelines

Report Guidelines

Your project submission will be evaluated primarily on the report that is submitted. It is expected that the project report contains enough detail, documentation, analysis, and discussion to adequately reflect the work you completed for your project. Because of this, it is extremely important that the report is written in a professional, standardized way, so those who review your project submission are able to clearly identify each component of your project in the report. Without a properly written report, your project cannot be sufficiently evaluated. A project report template is provided for you to understand how a project report should be structured. We strongly encourage students to have a report that is approximately nine to fifteen pages in length.

The Machine Learning Capstone Project report should be treated no different than a written research paper for academics. Your goal is to ultimately present the research you’ve discovered into the respective problem domain you’ve chosen, and then discuss each stage of the project as they are completed. The narrative found in the A project report template provides for a “report checklist” that will aid you in staying on track for both your project and the documentation in your report. Each stage can be found as a section that will guide you through each component of the project development life cycle. Please make use of this resource!

Example Reports

Example Machine Learning Capstone Reports

Included in the project files for the Capstone are three example reports that were written by students just like yourselves. Because the written report for your project will be how you are evaluated, it is absolutely critical that you are producing a clear, detailed, well-written report that adequately reflects the work that you’ve completed for your Capstone. Following along with the Capstone Guidelines will be very helpful as you begin writing your report.

Our first example report comes from graduate Martin Bede, whose project design in the field of computer vision, named “Second Sight”, was to create an Android application that would extract text from the device’s camera and read it aloud. Martin’s project cites the growing concern of vision loss as motivation for developing software that can aid those unable to see or read certain print.

Our second example report comes from an anonymous graduate whose project design in the field of image recognition was to implement a Convolutional Neural Network (CNN) to train on the Cifar-10 dataset and successfully identify different objects in new images. This student describes with thorough detail how a CNN can be used quite effectively as a descriptor-learning image recognition algorithm.

Our third example report comes from graduate Naoki Shibuya, who took advantage of the pre-curated robot motion planning “Plot and Navigate a Virtual Maze” project. Pay special attention to the emphasis Naoki places on discussing the methodology and results: Projects relying on technical implementations require valuable observations and visualizations of how the solution performs under various circumstances and constraints.

Each example report given has many desirable qualities we expect from students when completing the Machine Learning Capstone project. Once you begin writing your project report for which ever problem domain you choose, be sure to reference these examples whenever necessary!

Submitting the Project

Evaluation

Your project will be reviewed by a Udacity reviewer against the Machine Learning Capstone project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip), please take into consideration the total file size. You will need to include

  • Your capstone proposal document as proposal.pdf if you have completed the pre-requisite Capstone Proposal project. Please also include your review link in the student submission notes.
  • A project report (in PDF format only) addressing the five major project development stages. The recommended page length for a project report is approximately nine to fifteen pages. Please do not export an iPython Notebook as PDF for your project report.
  • All development Python code used for your project that is required to reproduce your implemented solution and result. Your code should be in a neat and well-documented format. Using iPython Notebooks is strongly encouraged for development.
  • A README documentation file which briefly describes the software and libraries used in your project, including any necessary references to supporting material. If your project requires setup/startup, ensure that your README includes the necessary instructions.
  • Any additional supporting material such as datasets, images, or input files that are necessary for your project’s development and implementation. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in your included README.
    Once you have collected these files and reviewed the project rubric, proceed to the project submission page.

Submission

Capstone Project

Project Submission

In this capstone project, you will leverage what you’ve learned throughout the Nanodegree program to solve a problem of your choice by applying machine learning algorithms and techniques. You will first define the problem you want to solve and investigate potential solutions and performance metrics. Next, you will analyze the problem through visualizations and data exploration to have a better understanding of what algorithms and features are appropriate for solving it. You will then implement your algorithms and metrics of choice, documenting the preprocessing, refinement, and postprocessing steps along the way. Afterwards, you will collect results about the performance of the models used, visualize significant quantities, and validate/justify these values. Finally, you will construct conclusions about your results, and discuss whether your implementation adequately solves the problem.

Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. Below are a few suggested problem areas you could explore if you are unsure what your passion is:

In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.

Note: For students who have enrolled before October 17th, we strongly encourage that you look at the Capstone Proposal project that is available as an elective before this project. If you have an idea for your capstone project but aren’t ready to begin working on the implementation, or even if you want to get feedback on how you will approach a solution to your problem, you can use the Capstone Proposal project to have a peer-review from one of our Capstone Project reviewers!

For whichever application area or problem you ultimately investigate, there are five major stages to this capstone project which you will move through and subsequently document. Each stage plays a significant role in the development life cycle of beginning with a problem definition and finishing with a polished, working solution. As you make your way through developing your project, be sure that you are also working on a rough draft of your project report, as it is the most important aspect to your submission!

Evaluation

Your project will be reviewed by a Udacity reviewer against the Machine Learning Capstone project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.

Submission Files

At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip), please take into consideration the total file size. You will need to include

  • Your capstone proposal document as proposal.pdf if you have completed the pre-requisite Capstone Proposal project. Please also include your review link in the student submission notes.
  • A project report (in PDF format only) addressing the five major project development stages. The recommended page length for a project report is approximately nine to fifteen pages. Please do not export an iPython Notebook as PDF for your project report.
  • All development Python code used for your project that is required to reproduce your implemented solution and result. Your code should be in a neat and well-documented format. Using iPython Notebooks is strongly encouraged for development.
  • A README documentation file which briefly describes the software and libraries used in your project, including any necessary references to supporting material. If your project requires setup/startup, ensure that your README includes the necessary instructions.
  • Any additional supporting material such as datasets, images, or input files that are necessary for your project’s development and implementation. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in your included README.
I’m Ready!

When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.

What’s Next?

You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!

Supporting Materials

Videos Zip File
THE MNIST DATABASE of handwritten digits
Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks
selfdrivingcars

 (2,2),25,500,128,200,success
Testing Accuracy: 0.8081980186934564
First result

1 convnet
1 fully_con

1
2
3
4
5
6
7
8
conv2d_maxpool(x, 25, (2,2), (1,1), (2,2), (2,2))
flatten(x)
fully_conn(x, 500)
tf.nn.dropout(x, keep_prob)
output(x,10)
epochs = 200
batch_size = 128
keep_probability = 0.5

save as capstone_model.meta

Testing Accuracy: 0.808136261212445

Second result

3 convnets
1 fully_con

1
2
3
4
5
6
7
8
9
10
11
12
x = conv2d_maxpool(x, 10, (2,2), (1,1), (2,2), (2,2))
x = conv2d_maxpool(x, 10, (2,2), (1,1), (2,2), (2,2))
x = conv2d_maxpool(x, 10, (2,2), (1,1), (2,2), (2,2))
flatten(x)
fully_conn(x, 500)
tf.nn.dropout(x, keep_prob)
output(x,10)
epochs = 200
batch_size = 128
keep_probability = 0.5
validation_accuracy: 0.7073670029640198

Testing Accuracy: 0.8427798201640447
with fully datasets
Testing Accuracy: 0.8851721937559089

Submit Your Project

ML Stanford